Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epub not downloaded (just title) #59

Open
ciapecki opened this issue Dec 4, 2018 · 21 comments
Open

epub not downloaded (just title) #59

ciapecki opened this issue Dec 4, 2018 · 21 comments

Comments

@ciapecki
Copy link

ciapecki commented Dec 4, 2018

I try to get the book providing cookie (I am logged in browser with my company's SSO):

$ safaribooks -c 'BrowserCookie=0eb1e1a9-2f0f-4034-874f-b72f39f59682;SessionID=18ka8abjrrhd3myc5zljpmpvguscj2e0' -b 9781449340124 download-epub
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-04 15:57:50 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-04 15:57:50 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-04 15:57:50 [SafariBooks] INFO: Using `/tmp/tmpo4v1aG` as temporary directory
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 15:57:50 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 15:57:50 [scrapy.core.engine] INFO: Spider opened
2018-12-04 15:57:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 15:57:50 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-04 15:57:51 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-04 15:57:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: None)
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-04 15:57:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/accounts/login/> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://www.safaribooksonline.com/accounts/login/)
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:53 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:54 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:55 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:56 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:57 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:58 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:57:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-04 15:57:59 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-04 15:58:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-04 15:58:00 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-04 15:58:00 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-04 15:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16754,
 'downloader/request_count': 30,
 'downloader/request_method_count/GET': 30,
 'downloader/response_bytes': 214326,
 'downloader/response_count': 30,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 4, 14, 58, 0, 688254),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 31,
 'log_count/INFO': 34,
 'memusage/max': 62570496,
 'memusage/startup': 62570496,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 30,
 'scheduler/dequeued/memory': 30,
 'scheduler/enqueued': 30,
 'scheduler/enqueued/memory': 30,
 'start_time': datetime.datetime(2018, 12, 4, 14, 57, 50, 251804)}
2018-12-04 15:58:00 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 12K
drwxr-xr-x 2 chris chris 4.0K Dec  4 15:58 .
drwxr-xr-x 5 chris chris 4.0K Dec  4 15:58 ..
-rw-r--r-- 1 chris chris 2.7K Dec  4 15:58 Head_First_JavaScript_Programming-9781449340124.epub

The downloaded epub is very small 2.7kB.

It seems like only some metadata are downloaded but without any content.

Any hints?

thanks,
Chris

@rahulonmars
Copy link

same for me...not working
Only title is downloaded.

@skeep
Copy link

skeep commented Dec 9, 2018

same issue. logged in using Company SSO

@owen800q
Copy link

same issue

@821wkli
Copy link

821wkli commented Dec 15, 2018

This issue was fixed #60

@ciapecki
Copy link
Author

ciapecki commented Dec 15, 2018

I fetched that commit but see no change:

ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-15 18:19:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.4-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-15 18:19:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-15 18:19:36 [SafariBooks] INFO: Using `/tmp/tmpAH1dtL` as temporary directory
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-15 18:19:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-15 18:19:36 [scrapy.core.engine] INFO: Spider opened
2018-12-15 18:19:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-15 18:19:36 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-15 18:19:37 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-15 18:19:39 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:40 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:40 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:41 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:41 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:42 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:42 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:43 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:43 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:44 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:45 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:46 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:46 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:47 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:47 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-15 18:19:48 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-15 18:19:48 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-15 18:19:48 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-15 18:19:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
 'downloader/request_count': 32,
 'downloader/request_method_count/GET': 32,
 'downloader/response_bytes': 214999,
 'downloader/response_count': 32,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 4,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 15, 17, 19, 48, 121239),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 33,
 'log_count/INFO': 34,
 'memusage/max': 61190144,
 'memusage/startup': 61190144,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 32,
 'scheduler/dequeued/memory': 32,
 'scheduler/enqueued': 32,
 'scheduler/enqueued/memory': 32,
 'start_time': datetime.datetime(2018, 12, 15, 17, 19, 36, 819662)}
2018-12-15 18:19:48 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 16K
drwxr-xr-x 2 chris chris 4.0K Dec 15 18:19 .
drwxr-xr-x 5 chris chris 4.0K Dec 15 18:19 ..
-rw-r--r-- 1 chris chris 2.7K Dec 15 18:19 Head_First_JavaScript_Programming-9781449340124.epub

@hankbao
Copy link

hankbao commented Dec 20, 2018

I can confirm that the issue is still there.

@hankbao
Copy link

hankbao commented Dec 20, 2018

Hey guys, you can use my fix in #62 to download epub for now.

@ciapecki
Copy link
Author

:(

ruby-2.5.1 [chris@t480cia safaribooks]$ git log -1
commit 1f9ccc9dcf55a74fe4ea4600cea0649311f7f0d8 (HEAD -> pr/62, origin/pr/62)
Author: Hank Bao <[email protected]>
Date:   Fri Dec 21 02:11:49 2018 +0800

    fix: update host in urls with usage text
ruby-2.5.1 [chris@t480cia safaribooks]$ 

ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-20 21:31:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-20 21:31:26 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-20 21:31:26 [SafariBooks] INFO: Using `/tmp/tmp28d5rb` as temporary directory
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-20 21:31:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-20 21:31:26 [scrapy.core.engine] INFO: Spider opened
2018-12-20 21:31:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-20 21:31:26 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-20 21:31:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/>
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-20 21:31:27 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://www.safaribooksonline.com/home/> from <GET https://www.safaribooksonline.com/home>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.safaribooksonline.com/accounts/login/> from <GET https://www.safaribooksonline.com/home/>
2018-12-20 21:31:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://www.safaribooksonline.com/accounts/login/>
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:29 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:29 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:30 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:31 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:31 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:32 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:33 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:34 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:34 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:35 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:35 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.safaribooksonline.com//library/cover/9781449340124/> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=9781449340124)
2018-12-20 21:31:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//library/cover/9781449340124/>: HTTP status code is not handled or not allowed
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-20 21:31:36 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-20 21:31:36 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-20 21:31:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 14221,
 'downloader/request_count': 32,
 'downloader/request_method_count/GET': 32,
 'downloader/response_bytes': 214969,
 'downloader/response_count': 32,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 4,
 'downloader/response_status_count/404': 24,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 20, 20, 31, 36, 568840),
 'httperror/response_ignored_count': 24,
 'httperror/response_ignored_status_count/404': 24,
 'log_count/DEBUG': 33,
 'log_count/INFO': 34,
 'memusage/max': 61202432,
 'memusage/startup': 61202432,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 32,
 'scheduler/dequeued/memory': 32,
 'scheduler/enqueued': 32,
 'scheduler/enqueued/memory': 32,
 'start_time': datetime.datetime(2018, 12, 20, 20, 31, 26, 613915)}
2018-12-20 21:31:36 [scrapy.core.engine] INFO: Spider closed (finished)

-rw-r--r-- 1 chris chris 2.7K Dec 20 21:31 Head_First_JavaScript_Programming-9781449340124.epub

@hankbao
Copy link

hankbao commented Dec 21, 2018

@ciapecki You were still using the old version. Need to uninstall the old version first and re-setup my fix.

@ciapecki
Copy link
Author

@hankbao now I uninstalled first but still similar empty file:

ruby-2.5.1 [chris@t480cia safaribooks]$ sudo pip2 uninstall safaribooks
[sudo] password for chris: 
Uninstalling safaribooks-0.1.1:
  Would remove:
    /usr/bin/safaribooks
    /usr/lib/python2.7/site-packages/safaribooks-0.1.1-py2.7.egg-info
    /usr/lib/python2.7/site-packages/safaribooks/*
Proceed (y/n)? y
  Successfully uninstalled safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks
bash: /usr/bin/safaribooks: No such file or directory

then installed and ran:

Successfully installed safaribooks-0.1.1
ruby-2.5.1 [chris@t480cia safaribooks]$ safaribooks -c 'BrowserCookie=cf7fba15-bf46-485d-b585-97c91161aca7;SessionID=x80tkjvh1dylp5hhz5xng8wym1yaehfh' -b 9781449340124 download-epub
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: safaribooks)
2018-12-21 08:14:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.15 (default, Jun 27 2018, 13:05:28) - [GCC 8.1.1 20180531], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 2.4.2, Platform Linux-4.19.9-arch1-1-ARCH-x86_64-with-glibc2.2.5
2018-12-21 08:14:49 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders'], 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'safaribooks'}
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.corestats.CoreStats']
2018-12-21 08:14:49 [SafariBooks] INFO: Using `/tmp/tmpKwNTat` as temporary directory
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-21 08:14:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-21 08:14:49 [scrapy.core.engine] INFO: Spider opened
2018-12-21 08:14:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-21 08:14:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-12-21 08:14:49 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/>
2018-12-21 08:14:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://learning.oreilly.com/home/> from <GET https://learning.oreilly.com/home>
2018-12-21 08:14:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/home/>
2018-12-21 08:14:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124> (referer: https://learning.oreilly.com/accounts/login/)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch11.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch12.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch10.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:53 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch08.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch09.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch07.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch06.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:54 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:55 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:55 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr04.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr05.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr03.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/pr02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/copyright.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:57 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/co02.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:57 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/author_bios.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ix01.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/apa.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:58 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:58 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/ch13.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://learning.oreilly.com/api/v1/book/9781449340124/chapter/dedication.html>: HTTP status code is not handled or not allowed
2018-12-21 08:14:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/library/cover/9781449340124/> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9781449340124)
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Closing spider (finished)
2018-12-21 08:14:59 [SafariBooks] INFO: Made archive /home/chris/staging/safaribooks/head-first-javascript.zip
2018-12-21 08:14:59 [SafariBooks] INFO: Moving /home/chris/staging/safaribooks/head-first-javascript.zip to /home/chris/staging/safaribooks/converted/Head_First_JavaScript_Programming-9781449340124.epub
2018-12-21 08:14:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 16440,
 'downloader/request_count': 30,
 'downloader/request_method_count/GET': 30,
 'downloader/response_bytes': 52402,
 'downloader/response_count': 30,
 'downloader/response_status_count/200': 4,
 'downloader/response_status_count/301': 1,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/401': 23,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 12, 21, 7, 14, 59, 342137),
 'httperror/response_ignored_count': 23,
 'httperror/response_ignored_status_count/401': 23,
 'log_count/DEBUG': 31,
 'log_count/INFO': 33,
 'memusage/max': 61227008,
 'memusage/startup': 61227008,
 'request_depth_max': 3,
 'response_received_count': 27,
 'scheduler/dequeued': 30,
 'scheduler/dequeued/memory': 30,
 'scheduler/enqueued': 30,
 'scheduler/enqueued/memory': 30,
 'start_time': datetime.datetime(2018, 12, 21, 7, 14, 49, 131657)}
2018-12-21 08:14:59 [scrapy.core.engine] INFO: Spider closed (finished)
ruby-2.5.1 [chris@t480cia safaribooks]$ ls -al converted/
total 20K
drwxr-xr-x 2 chris chris 4.0K Dec 21 08:14 .
drwxr-xr-x 5 chris chris 4.0K Dec 21 08:14 ..
-rw-r--r-- 1 chris chris 9.4K Dec 21 08:14 Head_First_JavaScript_Programming-9781449340124.epub

The file is bigger than before 9.4kB instead of 2.7kB but it's still content empty.

@hankbao
Copy link

hankbao commented Dec 21, 2018

@ciapecki A lot of errors with code 401 popped. It seems like the authentication credential you provided was invalid.

Can you try downloading your book with username and password?

@ciapecki
Copy link
Author

@hankbao I am logged with company's SSO. We don't have username/password.
While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12).
Maybe I am missing some more details from Cookie?

@hankbao
Copy link

hankbao commented Dec 21, 2018

@hankbao I am logged with company's SSO. We don't have username/password.
While I am logged in (I can see and read books) I get the BrowserCookie and SessionID from Chrome Inspect panel (F12).
Maybe I am missing some more details from Cookie?

I haven't looked into the cookie and session part of the code so I'm not for sure. However, with username and password, I can download my book now. Sometimes there were some 503 errors for some pages but you can always get the whole book by retrying.

@sanmibuh
Copy link

Thanks @hankbao It works for me with Docker and my company's SSO

@tofagerl
Copy link

tofagerl commented Jan 6, 2019

@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations.
Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0

@hankbao
Copy link

hankbao commented Jan 6, 2019

@hankbao I still have the same problem as @sanmibuh, with both docker and normal cli, both user/pass and cookie. Including log from using docker and cookie, but the 401 errors are the same in the other three configurations.
Log: https://www.dropbox.com/s/i3xmvcskwgt9yf1/safaribooks.log?dl=0

If you got 401s with username/password, perhaps your password is indeed incorrect. I'm not familiar with the cookie part of this project. Maybe @sanmibuh could share his experience.

@tofagerl
Copy link

tofagerl commented Jan 6, 2019

@hankbao Yeah, I thought the same, but it's the exact same one I use to login with. Copied straight out of my password manager. I'm gonna change it and see if that works.

@tofagerl
Copy link

tofagerl commented Jan 6, 2019

@hankbao Oh, ok. I changed my password, and that didn't work, but then I put it in quotes, and that worked. I use autogenerated passwords with lots of weird characters, so I should have thought of that earlier.

@BrianBrinkley
Copy link

@hankbao or @tofagerl I'm a little lost. I keep getting either:

Traceback (most recent call last):
File "/usr/local/bin/safaribooks", line 11, in
load_entry_point('safaribooks', 'console_scripts', 'safaribooks')()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 487, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2728, in load_entry_point
return ep.load()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2346, in load
return self.resolve()
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 2352, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
ModuleNotFoundError: No module named 'safaribooks.main'

or

docker: Error response from daemon: create $(pwd)/converted: "$(pwd)/converted" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path.
See 'docker run --help'.

Thanks.

@rahulonmars
Copy link

rahulonmars commented Jan 17, 2019

Hey guys, you can use my fix in #62 to download epub for now.

I can confirm. This works, but i'm not able to open epub

@JoeriBe
Copy link

JoeriBe commented Jan 21, 2019

Having the same issue:

2019-01-21 12:11:15 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://www.safaribooksonline.com//api/v1/book/9781457191350/chapter/04-ch1.xhtml>: HTTP status code is not handled or not allowed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants