Skip to content

TangAdorable/data-pipelines-with-airflow

 
 

Repository files navigation

Data Pipelines with Airflow

Contents

Data Source

Starting Airflow

Before we run Airflow, let's create these folders first:

mkdir -p mnt/dags mnt/logs mnt/plugins mnt/tests

On Linux, please make sure to configure the Airflow user for the docker-compose:

echo -e "AIRFLOW_UID=$(id -u)" > .env

With LocalExecutor

docker-compose build
docker-compose up

With CeleryExecutor

docker-compose -f docker-compose-celery.yml build
docker-compose -f docker-compose-celery.yml up

With SequentialExecutor (NOT recommended for production use)

docker-compose -f docker-compose-sequential.yml build
docker-compose -f docker-compose-sequential.yml up

To clean up the project, press Ctrl+C then run:

docker-compose down

Airflow S3 Connection to MinIO

Since MinIO offers S3 compatible object storage, we can set the connection type to S3. However, we'll need to set an extra option, so that Airflow connects to MinIO instead of S3.

  • Connection Name: minio or any name you like
  • Connection Type: S3
  • Login: <replace_here_with_your_minio_access_key>
  • Password: <replace_here_with_your_minio_secret_key>
  • Extra: a JSON object with the following properties:
    {
      "host": "http://minio:9000"
    }

Note: If we were using AWS S3, we don't need to specify the host in the extra.

Running Tests

First we need to install pytest:

pip install pytest

Run tests:

export PYTHONPATH=/opt/airflow/plugins
pytest

References

About

Skooldio: Data Pipelines with Airflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.7%
  • Dockerfile 0.3%