-
Notifications
You must be signed in to change notification settings - Fork 55
Docs site outage response playbook
Hillary Fraley edited this page Nov 22, 2022
·
2 revisions
Sensu docs are monitored using the Sensu CR Monitoring Caviar project. See also the live demo.
When you receive a PagerDuty notification that the Sensu docs site is down, first visit docs.sensu.io to confirm the site is not working.
If the site is in fact down, post in the #sensu-alerts Slack channel to confirm that you are responding to the PagerDuty notification. Then, redeploy the site with Heroku:
- Log in to the Sensu docs site Heroku app.
- Click the Deploy tab.
- Scroll down to "Manual Deploy" and click Deploy Branch.
- Wait for the docs site to build.
- Visit docs.sensu.io to confirm that the manual deployment fixed the problem.
- Post to the #sensu-alerts Slack channel to confirm that you've resolved the outage.
If the manual deployment did not fix the problem:
- In the Sensu docs site Heroku app, click the Resources tab.
- Scroll down to the "Add-ons" list and click Papertrail.
- In the Papertrail log window, search for the word "error."
- Read through the search results and find the error that matches the date and time of the docs site failure.
- Find the error code in the list of Heroku error codes.
- Copy and paste the Papertrail log entry for the error and a link to the relevant error code information in the #alerts and #reliability Slack channels.
If you do not find an error in the Papertrail log or the error code indicates that Heroku is having issues:
- Check the Heroku status page to confirm the problem.
- Post a link to the Heroku status page information in the #alerts and #reliability Slack channels.
If the problem is a Heroku issue, you may need to wait until the issue is resolved for the site to come back up. You can try restarting all dynos:
- Open the Sensu docs site Heroku app.
- Click the More button in the upper-right corner of the page.
- Click Restart all dynos.