Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL unable to parse/Bypass robots.txt #58

Open
hugolundin opened this issue Mar 27, 2017 · 2 comments
Open

URL unable to parse/Bypass robots.txt #58

hugolundin opened this issue Mar 27, 2017 · 2 comments

Comments

@hugolundin
Copy link

I have an https url that isn't able to parse. Using other methods, I've needed to bypass robots.txt, but it does not seem exist any setting for this in WKZombie?

@mkoehnke
Copy link
Owner

Hi @hugolundin No, currently there's no such setting. What are you trying to accomplish? Maybe changing the user agent or adjusting the http headers might help?

@hugolundin
Copy link
Author

I am trying to parse a website for some urls. It has worked fine using selenium with PhantomJS, and also with Mechanize in Python, but when I try doing it with WKZombie, the website loads until it logs "Unable to parse". The reason I thought about robots.txt was because Mechanize complained about it before I activated their setting to bypass it.

Do you have any suggestions in what way there are common to change user agent and/or the http headers? Thank you very much for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants