Apache Airflow is an open-source workflow management platform for data engineering pipelines. It is designed to programmatically author, schedule, and monitor workflows. Airflow is one of the best data orchestration. Although for beginners, airflow is hard to setup. The repository will be a template for the author to be used in future developments. Also, roadmaps will be documented for decision-making purposes.
- Create a docker-compose for initializing and/or admin accounts (postgre db)
- Create a docker-compose for users (postgre db)
- Create a docker-compose for testing (sqlite db)
For every development, the author almost always use containerize environment for its portability. The objective it has to be portability in different environments.
- Install the latest docker in your local machine.
Initializing your airflow docker environment. If developer and/or admin:
docker compose -f "docker-compose.yaml" up -d --build
If non-technical users:
docker compose -f "docker-compose.users.yaml" up -d --build
If test environment:
docker compose -f "docker-compose.tests.yaml" up -d --build
Navigate inside your browser.
http://localhost:8080
The followings tasks is in consideration of low-cost development, but highly-scalable.
- Setup Dockerfile and docker-compose.
- Create a test airflow environment, with Docker.
- Create a production airflow environment, with Docker.
- Connect metadatabase to cloud postgres. (Free Version - Neon Tech Postgres.)
- Connect metadatabase to AWS, Azure or GCP cloud. (Paid Version, if operations are high)
- Connect Airflow with Grafana, Prometheus and Statsd.
- Setup Grafana with Airflow.
- Setup Grafana Visualization (JSON file)
- Setup Grafana cloud connection. (if operations are high)