Replies: 3 comments 7 replies
-
Hi @noklam thanks for kicking off the discussion! In this situation what would the |
Beta Was this translation helpful? Give feedback.
-
Hello @noklam. This is a great question, and while I'm not sure I have all the answers maybe I can provide some useful tips anyway. It is possible to do programmatic runs, i.e. instantiate a kedro run using the Python API rather than through the CLI. This is handled through sessions and is exactly what
from kedro.framework.session import KedroSession
with KedroSession.create(project_path="path/to/project/1") as session:
session.run()
with KedroSession.create(project_path="path/to/project/2") as session:
session.run()
|
Beta Was this translation helpful? Give feedback.
-
Hello @noklam, sorry for self-advertising, but for your record,
You can log the pipeline in mlflow programatically: from pathlib import Path
from kedro.framework.context import load_context
from kedro_mlflow.mlflow import KedroPipelineModel
from mlflow.models import ModelSignature
context=load_context(".")
pipelines=context.pipelines
catalog = context.io
# convert your two pipelines to a PipelineML object
pipeline_training= pipeline_ml_factory(
training=pipelines["ml_pipeline"].only_nodes_with_tags("training"), #ssume you have a pipeline with nodes tagged "training
inference=pipelines["ml_pipeline"].only_nodes_with_tags("inference"),
input_name="instances"
)
# artifacts are all the inputs of the inference pipelines that are persisted in the catalog
artifacts = pipeline_training.extract_pipeline_artifacts(catalog)
kedro_model = KedroPipelineModel(
pipeline_ml=pipeline_training,
catalog=catalog
)
mlflow.pyfunc.log_model(
artifact_path="model",
python_model=kedro_model,
artifacts=artifacts
) You will later be able to reuse it from a script, serve it, or even feed it in the catalog for a downstream task: PROJECT_PATH = r"<your/project/path>"
RUN_ID = "<your-run-id>"
from kedro.framework.context import load_context
from kedro_mlflow.framework.context import get_mlflow_config
from mlflow.pyfunc import load_model
local_context = load_context(PROJECT_PATH)
mlflow_config = get_mlflow_config(local_context)
mlflow_config.setup(local_context)
instances = local_context.io.load("instances")
model = load_model(f"runs:/{RUN_ID}/kedro_mlflow_tutorial")
predictions = model.predict(instances) You can find a detailed example with code here: https://github.com/Galileo-Galilei/kedro-mlflow-tutorial. A very nice feature is that if you declare your pipeline as a |
Beta Was this translation helpful? Give feedback.
-
Kedro has been doing a nice job in structuring data science projects and build modular pipelines. I have been using it for development, but it is not clear to me that how should I deploy and distribute the pipelines.
For example, use cases as follow.
Distribute the pipeline to 3rd party
Say if I package up the kedro pipeline, it seems that the most straightforward way to run this pipeline is via the CLI. What if I have 2 pipelines distributed and I want them to run together? It is easy to do they if both pipelines exist in 1 single repository, but this will not be the case if I am packaging up the individual pipeline and shared to others.
Integrate pipeline with other python code
For example, I have a machine learning pipeline that trains a model. In order to use it for deployment, I may need to perform the following steps
The current Kedro pipeline is pretty much a standalone application, it talks to the pipeline itself and filesystem directly, but not to other python applications. i.e. I want to return the model and the web services should load the model from memory directly instead of reading it from files. The pseudocode is similar to this.
Thank you in advance, would be nice to know how others people are deploying the kedro pipeline.
(p.s. I find this issue is a more extensive description of many issues that I have, #795)
Beta Was this translation helpful? Give feedback.
All reactions