Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache posts for QAPage #7

Open
jaspervriends opened this issue Jan 7, 2019 · 1 comment
Open

Cache posts for QAPage #7

jaspervriends opened this issue Jan 7, 2019 · 1 comment
Labels
discuss Discussion until answer solution needed Suggest and discuss what would be the best solution for this problem

Comments

@jaspervriends
Copy link
Member

Some discussion between matteocontrini and me on the Flarum forum about what would be the best way to send the post of a discussion to the crawler using QAPages.

Possible solutions:

  • Don't add posts to the QAPages
  • Limit the posts at the QAPages: max 5 results, as search engines do not preview all them just a few, but, Google for example indexes the content of the posts as well, so that could be usefull for search results
  • Cache the JSON string of the discussion posts and save it to the cache directory on the server disk and re-cache it once per day per discussion
  • Hide the json (with the posts) for normal browser requests, only show for crawlers/bots

Discuss

What would be a great solution? Other ideas or tips?

@jaspervriends jaspervriends added solution needed Suggest and discuss what would be the best solution for this problem discuss Discussion until answer labels Jan 7, 2019
@jaspervriends jaspervriends changed the title Chache posts for QAPage and Chache posts for QAPage Jan 7, 2019
@matteocontrini
Copy link

matteocontrini commented Jan 7, 2019

If I had to make a choice, I would go for the option to show the posts data to crawlers only. I'm thinking about cases where a discussion has a lot of posts, where it would be far from ideal (and useless) to send potentially hundreds of kilobytes of data to every user that reaches that post.

As mentioned "on the other side", there seems to be great libraries that take care of matching the user agent, and the one I linked supports more than one thousand patterns, with an additional generic one that matches words ending with "bot", etc. The only thing I would test is the overhead of matching a string against one thousand of regular expressions...

I'm not a SEO expert, so I'm not really sure how good/bad this solution actually is, but the only downside I can think of is that some very unpopular crawlers out there could not be in that list. But honestly, this feature is primarily thought for Google, so as long as you allow it, it should be ok. The Flarum API is always there if someone wants the full collection of posts for a discussion (I believe Google already has some kind of pattern for parsing Flarum discussion, I've been really impressed by how much it's able to scrape/index with zero optimizations).

EDIT: you have a typo in the title, it should spell "cache"

@jaspervriends jaspervriends changed the title Chache posts for QAPage Cache posts for QAPage Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Discussion until answer solution needed Suggest and discuss what would be the best solution for this problem
Projects
None yet
Development

No branches or pull requests

2 participants