You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Description
The current default robots.txt can create confusion for the users. It does not help understanding why on a production website it would be returned instead of the robots.txt configured in the site repository.
This is the current default robots.txt:
# Helix robots.txt FAQ
#
# Q: This looks like a default robots.txt, how can I provide my own?
# A: Put a file named robots.txt into the root of your GitHub
# repo, Franklin will serve it from there.
#
# Q: Why am I'm seeing this robots.txt instead of the one I
# configured?
# A: You are visiting from *.hlx.page or *.hlx.live - in order
# to prevent these sites from showing up in search engines and
# giving you a duplicate content penalty on your real site we
# exclude all robots
#
# Q: What do you mean with "real site"?
# A: If you add a custom domain to this site (e.g.
# example.com), then Franklin detects that you are ready for
# production and serves your own robots.txt - but only on
# example.com
#
# Q: This does not answer my questions at all. What can I do?
# A: head over to #franklin-chat on Slack or
# github.com/adobe/helix-home/issues and ask your question
# there.
User-agent: *
Disallow: /
Phrasing issue in the default robots.txt
The problem is the part defining the "real site". The message states that:
Problem 1 - Franklin detects that you are ready for production
This is actually not the case, the behavior of returning the default robots.txt or not is defined by the presence of the x-forwarded-host header in the BYOCDN configuration. So a client would be trying to find out where to configure this example.com domain in helix.
There is no mention of the domain anywhere in the helix documentation except on the Push invalidation configuration. And in the BYOCDN configuration, there is no mention of the importance of x-forwarded-host as the definition of the "real site". Only a screenshot with the header configured.
Problem 2 - but only on example.com
This behavior is not factual, once the CDN is correctly configured any domain hooked on that CDN will show the robots.txt from the repository. I believe rephrasing this passage might help users understand the issue.
By example, if you are using Cloudfront the repository robots.txt would be returned from the "real site" domain (ie: example.com) and your CloudFront distribution (randomid123.cloudfront.net)
Behaviour in which the problem appears
The current problematic behavior is the following:
Create a new website
Configure the BYOCDN but omit the x-forwarded-host(by mistake let's say)
See the default robots.txt
Reading the message you commit a robots.txt to your repository
Everything works as expected except the default robots.txt is still returned
Suggested solution
Append to the BYOCDN documentation information about the importance of x-forwarded-host
Add to the Go-Live verification documentation the presence of the x-forwarded-host
Change the default robots.txt text to mention that if they see the default robots.txt on the "real site", it's most probably due to a CDN configuration problem. This would point them to the amended documentation above.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Description
The current default
robots.txt
can create confusion for the users. It does not help understanding why on a production website it would be returned instead of therobots.txt
configured in the site repository.This is the current default
robots.txt
:Phrasing issue in the default robots.txt
The problem is the part defining the "real site". The message states that:
Problem 1 -
Franklin detects that you are ready for production
This is actually not the case, the behavior of returning the default
robots.txt
or not is defined by the presence of thex-forwarded-host
header in the BYOCDN configuration. So a client would be trying to find out where to configure thisexample.com
domain in helix.There is no mention of the domain anywhere in the helix documentation except on the Push invalidation configuration. And in the BYOCDN configuration, there is no mention of the importance of
x-forwarded-host
as the definition of the "real site". Only a screenshot with the header configured.Problem 2 -
but only on example.com
This behavior is not factual, once the CDN is correctly configured any domain hooked on that CDN will show the
robots.txt
from the repository. I believe rephrasing this passage might help users understand the issue.By example, if you are using Cloudfront the repository
robots.txt
would be returned from the "real site" domain (ie: example.com) and your CloudFront distribution (randomid123.cloudfront.net)Behaviour in which the problem appears
The current problematic behavior is the following:
x-forwarded-host
(by mistake let's say)robots.txt
robots.txt
to your repositoryrobots.txt
is still returnedSuggested solution
x-forwarded-host
x-forwarded-host
robots.txt
text to mention that if they see the default robots.txt on the "real site", it's most probably due to a CDN configuration problem. This would point them to the amended documentation above.The text was updated successfully, but these errors were encountered: