URL unable to parse/Bypass robots.txt #58

hugolundin · 2017-03-27T07:47:12Z

I have an https url that isn't able to parse. Using other methods, I've needed to bypass robots.txt, but it does not seem exist any setting for this in WKZombie?

mkoehnke · 2017-03-27T20:00:16Z

Hi @hugolundin No, currently there's no such setting. What are you trying to accomplish? Maybe changing the user agent or adjusting the http headers might help?

hugolundin · 2017-03-28T12:46:44Z

I am trying to parse a website for some urls. It has worked fine using selenium with PhantomJS, and also with Mechanize in Python, but when I try doing it with WKZombie, the website loads until it logs "Unable to parse". The reason I thought about robots.txt was because Mechanize complained about it before I activated their setting to bypass it.

Do you have any suggestions in what way there are common to change user agent and/or the http headers? Thank you very much for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL unable to parse/Bypass robots.txt #58

URL unable to parse/Bypass robots.txt #58

hugolundin commented Mar 27, 2017

mkoehnke commented Mar 27, 2017

hugolundin commented Mar 28, 2017

URL unable to parse/Bypass robots.txt #58

URL unable to parse/Bypass robots.txt #58

Comments

hugolundin commented Mar 27, 2017

mkoehnke commented Mar 27, 2017

hugolundin commented Mar 28, 2017