Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

403 when fetching chapters from API #65

Open
elpiel opened this issue Mar 3, 2019 · 0 comments
Open

403 when fetching chapters from API #65

elpiel opened this issue Mar 3, 2019 · 0 comments

Comments

@elpiel
Copy link

elpiel commented Mar 3, 2019

I am currently trying one book but I get a 403 error and when I open it in the browser while still logged in I get You do not have permission to perform this action.

2019-03-03 13:34:56 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: safaribooks)
2019-03-03 13:34:56 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 16.4.1, Python 3.6.8 (default, Jan 30 2019, 23:54:38) - [GCC 6.4.0], pyOpenSSL 19.0.0 (OpenSSL 1.0.2q  20 Nov 2018), cryptography 2.6.1, Platform Linux-4.15.0-45-generic-x86_64-with
2019-03-03 13:34:56 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'safaribooks', 'DOWNLOAD_DELAY': 0.25, 'NEWSPIDER_MODULE': 'safaribooks.spiders', 'SPIDER_MODULES': ['safaribooks.spiders']}
2019-03-03 13:34:56 [scrapy.extensions.telnet] INFO: Telnet Password: ba131cc2f5422341
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats']
2019-03-03 13:34:56 [SafariBooks] INFO: Using `/tmp/tmp6l2jbpbq` as temporary directory
2019-03-03 13:34:56 [scrapy.core.downloader.handlers] ERROR: Loading "scrapy.core.downloader.handlers.ftp.FTPDownloadHandler" for scheme "ftp"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/handlers/__init__.py", line 48, in _load_handler
    dhcls = load_object(path)
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/usr/local/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/usr/local/lib/python3.6/site-packages/scrapy/core/downloader/handlers/ftp.py", line 36, in <module>
    from twisted.protocols.ftp import FTPClient, CommandFailed
ModuleNotFoundError: No module named 'twisted.protocols.ftp'
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-03-03 13:34:56 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-03-03 13:34:56 [scrapy.core.engine] INFO: Spider opened
2019-03-03 13:34:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-03-03 13:34:56 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-03-03 13:34:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/accounts/login/> from <GET https://learning.oreilly.com/>
2019-03-03 13:34:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/accounts/login/> (referer: None)
2019-03-03 13:34:58 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://learning.oreilly.com/home/> from <POST https://learning.oreilly.com/accounts/login/>
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/home/> (referer: https://learning.oreilly.com/accounts/login/)
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681> (referer: https://learning.oreilly.com/home/)
2019-03-03 13:34:59 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch06.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:34:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch06.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:00 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch05.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:00 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch05.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:00 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch03.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch03.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch04.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch04.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch02.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:01 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch02.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:01 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/preface.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/preface.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/foreword.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/foreword.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:02 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/toc.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/toc.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/dedication.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/dedication.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/copy.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/copy.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/title.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:03 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/title.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:03 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm02.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm02.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/pref00.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/pref00.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/cover.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://learning.oreilly.com/library/cover/9780134757681/> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:04 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/cover.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01_images.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch01_images.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/index.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm03.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/biblo.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/index.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/fm03.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/biblo.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch12.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch12.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch11.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch11.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:06 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch10.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch10.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch09.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch09.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch08.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch08.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch07.xhtml> (referer: https://learning.oreilly.com/nest/epub/toc/?book_id=9780134757681)
2019-03-03 13:35:07 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://learning.oreilly.com/api/v1/book/9780134757681/chapter/ch07.xhtml>: HTTP status code is not handled or not allowed
2019-03-03 13:35:07 [scrapy.core.engine] INFO: Closing spider (finished)
2019-03-03 13:35:07 [SafariBooks] INFO: Made archive /app/refactoring-improving-the.zip
2019-03-03 13:35:07 [SafariBooks] INFO: Moving /app/refactoring-improving-the.zip to download/Refactoring__Improving_the_Design_of_Existing_Code-9780134757681.epub
2019-03-03 13:35:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 17319,
 'downloader/request_count': 31,
 'downloader/request_method_count/GET': 30,
 'downloader/request_method_count/POST': 1,
 'downloader/response_bytes': 63071,
 'downloader/response_count': 31,
 'downloader/response_status_count/200': 4,
 'downloader/response_status_count/302': 2,
 'downloader/response_status_count/403': 25,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 3, 3, 13, 35, 7, 954511),
 'httperror/response_ignored_count': 25,
 'httperror/response_ignored_status_count/403': 25,
 'log_count/DEBUG': 31,
 'log_count/ERROR': 1,
 'log_count/INFO': 37,
 'memusage/max': 55988224,
 'memusage/startup': 55988224,
 'request_depth_max': 3,
 'response_received_count': 29,
 'scheduler/dequeued': 31,
 'scheduler/dequeued/memory': 31,
 'scheduler/enqueued': 31,
 'scheduler/enqueued/memory': 31,
 'start_time': datetime.datetime(2019, 3, 3, 13, 34, 56, 344411)}
2019-03-03 13:35:07 [scrapy.core.engine] INFO: Spider closed (finished)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant