This guide provides instructions on how to set up Metaflow with Airflow and MinIO on Minikube
Ensure the following software is installed on your system:
- Minikube: Download and install from Minikube's official website
- kubectl: Download and install from Kubernetes' official website
- Helm: Download and install from Helm's official website
- ngrok: Download and install from ngrok's official website
- An ngrok key created from the ngrok dashboard
-
Start Minikube by executing the following command:
minikube start --cpus 6 --memory 10240
-
Add the MinIO Helm repository and update it:
helm repo add minio https://charts.min.io/ helm repo update
-
Install MinIO using Helm:
helm install --set resources.requests.memory=512Mi --set replicas=1 --set persistence.enabled=false --set mode=standalone --set rootUser=rootuser,rootPassword=rootpass123 minio-s3 minio/minio
-
Set up port forwarding for the MinIO service:
kubectl port-forward svc/minio-s3 9000 --namespace default
The MinIO service will now be accessible at http://localhost:9000.
-
Install metaflow and kubernetes:
pip install metaflow kubernetes
-
Create a metaflow bucket named
metaflow-test
in MinIO using the create-bucket.py Python script. The--access-key
/--secret-key
correspond to therootUser
/rootPassword
set in Step 4.python create_bucket.py --access-key rootuser --secret-key rootpass123 --bucket-name metaflow-test
-
Create an ngrok tunnel to the port-forwarded MinIO service in a separate terminal window:
ngrok http 9000
-
Create a Kubernetes secret for MinIO. This secret will be used by Metaflow Tasks and Metaflow UI running on Kubernetes to access data stored in MinIO:
kubectl create secret generic minio-secret --from-literal=AWS_ACCESS_KEY_ID=rootuser --from-literal=AWS_SECRET_ACCESS_KEY=rootpass123
-
Install Metaflow in the
default
namespace and enable ingress on minikube:minikube addons enable ingress git clone [email protected]:outerbounds/metaflow-tools.git .mf-tools helm upgrade --install metaflow .mf-tools/k8s/helm/metaflow \ --timeout 15m0s \ --namespace default \ --set metaflow-ui.uiBackend.metaflowDatastoreSysRootS3=s3://metaflow-test/metaflow \ --set metaflow-ui.uiBackend.metaflowS3EndpointURL="<NGROK-TUNNEL-URL-COMES-HERE>" \ --set "metaflow-ui.envFrom[0].secretRef.name=minio-secret" \ --set metaflow-ui.ingress.className=nginx \ --set metaflow-ui.ingress.enabled=true
-
Create a metaflow configuration file under
~/.metaflowconfig/config.json
. Ensure you name itconfig_airflow_minio.json
:{ "METAFLOW_S3_ENDPOINT_URL": "<NGROK TUNNEL URL COMES HERE>", "METAFLOW_DEFAULT_DATASTORE": "s3", "METAFLOW_DATASTORE_SYSROOT_S3":"s3://metaflow-test/metaflow", "METAFLOW_DATATOOLS_S3ROOT": "s3://metaflow-test/data", "METAFLOW_DEFAULT_METADATA" : "service", "METAFLOW_KUBERNETES_SECRETS": "minio-secret", "METAFLOW_SERVICE_INTERNAL_URL": "http://metaflow-metaflow-service.default.svc.cluster.local:8080", "METAFLOW_AIRFLOW_KUBERNETES_KUBECONFIG_CONTEXT": "minikube" }
-
Start the Airflow installation in a separate terminal window:
pip install apache-airflow apache-airflow-providers-cncf-kubernetes mkdir ~/airflow && mkdir ~/airflow/dags && airflow standalone
-
Create the Airflow DAG file for the helloflow.py:
export METAFLOW_PROFILE=airflow_minio python helloflow.py airflow create minio-test-dag.py cp minio-test-dag.py ~/airflow/dags/minio-test-dag.py
-
Ensure DAGs get loaded with
airflow dags reserialize
. -
Trigger the DAG from the Airflow UI. The DAG named
HelloFlow
will appear on the Airflow UI.