With Django Datawatch you are able to implement arbitrary checks on data, review their status and even describe what to do to resolve them. Think of nagios/icinga for data.
Currently celery is required to run the checks. Datawatch may support different backends in the future.
$ pip install django-datawatch
Add django_datawatch
to your INSTALLED_APPS
Create checks.py
inside your module.
from datetime import datetime
from celery.schedules import crontab
from django_datawatch.datawatch import datawatch
from django_datawatch.base import BaseCheck, CheckResponse
from django_datawatch.models import Result
@datawatch.register
class CheckTime(BaseCheck):
run_every = crontab(minute='*/5') # scheduler will execute this check every 5 minutes
def generate(self):
yield datetime.now()
def check(self, payload):
response = CheckResponse()
if payload.hour <= 7:
response.set_status(Result.STATUS.ok)
elif payload.hour <= 12:
response.set_status(Result.STATUS.warning)
else:
response.set_status(Result.STATUS.critical)
return response
def get_identifier(self, payload):
# payload will be our datetime object that we are getting from generate method
return payload
def get_payload(self, identifier):
# as get_identifier returns the object we don't need to process it
# we can return identifier directly
return identifier
Must yield payloads to be checked. The check method will then be called for every payload.
Must return an instance of CheckResponse.
Must return a unique identifier for the payload.
Check updates for individual payloads can also be triggered when related datasets are changed. The map for update triggers is defined in the Check class' trigger_update attribute.
trigger_update = dict(subproduct=models_customer.SubProduct)
The key is a slug to define your trigger while the value is the model that issues the trigger when saved. You must implement a resolver function for each entry with the name of get__payload which returns the payload to check (same datatype as .check would expect or .generate would yield).
def get_subproduct_payload(self, instance):
return instance.product
A management command is provided to queue the execution of all checks based on their schedule. Add a crontab to run this command every minute and it will check if there's something to do.
$ ./manage.py datawatch_run_checks
$ ./manage.py datawatch_run_checks --slug=example.checks.UserHasEnoughBalance
A management command is provided to forcefully refresh all existing results for a check. This comes in handy if you changes the logic of your check and don't want to wait until the periodic execution or an update trigger.
$ ./manage.py datawatch_refresh_results
$ ./manage.py datawatch_refresh_results --slug=example.checks.UserHasEnoughBalance
$ ./manage.py datawatch_list_checks
Remove the unnecessary check results if you've removed the code for a check.
$ ./manage.py datawatch_delete_ghost_results
DJANGO_DATAWATCH_BACKEND = 'django_datawatch.backends.synchronous'
DJANGO_DATAWATCH_CELERY_QUEUE_NAME = 'django_datawatch'
DJANGO_DATAWATCH_RUN_SIGNALS = True
You can chose the backend to run the tasks. Supported are 'django_datawatch.backends.synchronous' and 'django_datawatch.backends.celery'.
Default: 'django_datawatch.backends.synchronous'
You can customize the celery queue name for async tasks (applies only if celery backend chosen).
Default: 'django_datawatch'
Use this setting to disable running post_save updates during unittests if required.
Default: True
- docker (at least 17.12.0+)
- docker-compose (at least 1.18.0)
- docker-hostmanager
In order to access the application on your browser, your host machine must be able to resolve the host name of your container. We're using docker-hostmanager to manage the hosts file entries.
Linux:
$ docker run -d --name docker-hostmanager --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v /etc/hosts:/hosts iamluc/docker-hostmanager
For other environments, see https://github.com/iamluc/docker-hostmanager
We've included an example app to show how django_datawatch works. Start by launching the included docker container.
docker-compose up -d
Then setup the example app environment.
docker-compose exec django ./manage.py migrate
docker-compose exec django ./manage.py loaddata example
The installed superuser is "example" with password "datawatch".
Login on the admin interface and open http://datawatch.rh-dev.eu:8000/ afterwards. You'll be prompted with an empty dashboard. That's because we didn't run any checks yet. Let's enqueue an update.
docker-compose exec django ./manage.py datawatch_run_checks --force
The checks for the example app are run synchronously and should be updated immediately. If you decide to switch to the celery backend, you should now start a celery worker to process the checks.
docker-compose exec django celery worker -A example -l DEBUG -Q django_datawatch
You will see some failed check now after you refreshed the dashboard view.
bumpversion is used to manage releases.
Add your changes to the CHANGELOG and run bumpversion <major|minor|patch>
, then push (including tags)