Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor osmx-update for failures and send alerts #27

Open
CloudNiner opened this issue Sep 10, 2020 · 1 comment
Open

Monitor osmx-update for failures and send alerts #27

CloudNiner opened this issue Sep 10, 2020 · 1 comment

Comments

@CloudNiner
Copy link
Contributor

CloudNiner commented Sep 10, 2020

The osmx-update script is run in cron of an EC2 instance. This script can fail in one of a few clear ways:

  • Unable to retrieve new minutely replication file
  • Unable to generate the augmented diff, either due to bugs in the adiff.py or some data issue
  • Unable to apply the minutely change file to osmx database

When the script fails and does not commit the current minutely diff to the osmx database, future invocations via cron will continue to retry the failed minute until it succeeds, which in some cases may never happen.

We want to capture these failures when they occur via some form of monitoring and also alert our team when they do so that appropriate action can be taken.

Currently, osmx-update will throw an exception and exit non-zero if any part of the program crashes. A first iteration could watch for non-zero exit codes and send alerts via a pre-configured AWS service.

@bdon
Copy link

bdon commented Sep 17, 2020

In my case I have a cron job that runs osmx query planet.osmx timestamp and writes the diff in seconds to a monitoring api (CloudWatch) - and has an alarm triggered by being more than a few minutes behind. This accomplishes all of the above in a simple way, and also lets you see a graph of how far replication is behind - if osm.org/replication is down or your disks become very slow, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants