You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The osmx-update script is run in cron of an EC2 instance. This script can fail in one of a few clear ways:
Unable to retrieve new minutely replication file
Unable to generate the augmented diff, either due to bugs in the adiff.py or some data issue
Unable to apply the minutely change file to osmx database
When the script fails and does not commit the current minutely diff to the osmx database, future invocations via cron will continue to retry the failed minute until it succeeds, which in some cases may never happen.
We want to capture these failures when they occur via some form of monitoring and also alert our team when they do so that appropriate action can be taken.
Currently, osmx-update will throw an exception and exit non-zero if any part of the program crashes. A first iteration could watch for non-zero exit codes and send alerts via a pre-configured AWS service.
The text was updated successfully, but these errors were encountered:
In my case I have a cron job that runs osmx query planet.osmx timestamp and writes the diff in seconds to a monitoring api (CloudWatch) - and has an alarm triggered by being more than a few minutes behind. This accomplishes all of the above in a simple way, and also lets you see a graph of how far replication is behind - if osm.org/replication is down or your disks become very slow, for example.
The
osmx-update
script is run in cron of an EC2 instance. This script can fail in one of a few clear ways:When the script fails and does not commit the current minutely diff to the osmx database, future invocations via cron will continue to retry the failed minute until it succeeds, which in some cases may never happen.
We want to capture these failures when they occur via some form of monitoring and also alert our team when they do so that appropriate action can be taken.
Currently,
osmx-update
will throw an exception and exit non-zero if any part of the program crashes. A first iteration could watch for non-zero exit codes and send alerts via a pre-configured AWS service.The text was updated successfully, but these errors were encountered: