Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requests for og being tagged as robot requests #8

Open
YasharF opened this issue Mar 29, 2016 · 4 comments
Open

Requests for og being tagged as robot requests #8

YasharF opened this issue Mar 29, 2016 · 4 comments

Comments

@YasharF
Copy link

YasharF commented Mar 29, 2016

Some sites recognize the requests from this module as robot requests and respond with a stripped down version of the page which is missing og data.

Examples: https://www.sciencedaily.com/releases/2016/03/160310082606.htm

@YasharF
Copy link
Author

YasharF commented Mar 29, 2016

I think setting a user agent in the requests may resolve this problem, but I'm not sure.

@samholmes
Copy link
Owner

Hmm, interesting. Haven't thought about this. Is it good for the internet to pretend to not be a robot; is the module a robot or not. I guess it would depend on how it's used.

Maybe an option to set a user agent would help with this. I'll have to think about it.

@theunexpected1
Copy link

theunexpected1 commented Oct 6, 2018

Do we have a user agent defined for requests going through this?

I am working on a site which does not provide the og tags because the user agent is none in the listed below:

"googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|Viber|WhatsApp|Telegram"

Request to set a user agent from one of this, or perhaps a new one?

@samholmes
Copy link
Owner

samholmes commented Oct 8, 2018

Until new APIs are decided to supplement the need to customize requests, you can make your own requests for the HTML using a library like request or axios. Once you have the HTML, you can pass that to the parse function:

og.parse(html)

This synchronous function will return the open graph meta data object without making any HTTP requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants