-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add custom robots.txt to block indexing /user and AI bots #55
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽 ... based on the default robots.txt link you shared, shouldn't we just be overriding specific blocks (all_user_agents
, additional_user_agents
, etc.) rather than creating a completely new robots.txt file?
Disallow: / | ||
|
||
|
||
# Generatedy by RoboShield (https://roboshield.trustlab.africa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RoboShield FTW!
Disallow: /user/ | ||
|
||
# Amazonbot | ||
User-agent: Amazonbot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing to other sites, we may be missing some crawlers
The Guardian, Washington Post, BBC, etc.
This PR creates a custom
robots.txt
overriding CKAN's defaultrobots.txt
We'd like to block indexing &/ crawling on
/user
and block generic AI bots not in Cloudflare's Verified bot's list