Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX Design: Deploying Pipelines to Airflow #452

Open
dorx opened this issue Dec 11, 2021 · 1 comment
Open

UX Design: Deploying Pipelines to Airflow #452

dorx opened this issue Dec 11, 2021 · 1 comment
Labels
documentation Improvements or additions to documentation User Story ux_design

Comments

@dorx
Copy link
Contributor

dorx commented Dec 11, 2021

Goal
Add an Airflow DAG to a user-specified Airflow server for an artifact.

Current user workflow
The data scientist has developed an artifact, say an ML model called clf, in a Jupyter notebook. To create an Airflow DAG, they would have to manually write a Python script to translate the code in the Jupyter notebook into the Airflow DSL to construct a DAG. This DAG is then placed in the DAG folder of the Airflow server they are submitting the DAG to.

User workflow with Linea
Note: This is agnostic of the entry point to Linea (CLI or IPython). We will discuss the UX at the API level.

Airflow config: the user specifies the URI for AIRFLOW_HOME in a Linea config file, say in lineapy/config.yml.

The user first calls lineapy.save(clf) to get the LineaArtifact object associated with clf named clf_artifact. The user then invokes lineapy.to_airflow(clf_artifact) to generate a dag.py file and send to the AIRFLOW_HOME directory.

to_airflow() takes an optional dict argument for users to specify the input parameters to the DAG if they are familiar with them, such as schedule_interval and max_active_runs.

function signature for to_airflow():

def to_airflow(artifacts: Union[LineaArtifact, List[LineaArtifact]], props: Dict[String, String]):
    ...

This allows users to pass in multiple artifacts for a single DAG.

Note: to_airflow() handles the transfer of the dag.py file to AIRFLOW_HOME as in the current implementation. This is potentially a point of further discussion.

Desiderata

  • No dependency on airflow from lineapy

Proposed solution
Construct dag.py using Jinja templates.

@dorx dorx added documentation Improvements or additions to documentation ux_design User Story labels Dec 11, 2021
@dorx
Copy link
Contributor Author

dorx commented Dec 11, 2021

CC @marov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation User Story ux_design
Projects
None yet
Development

No branches or pull requests

1 participant