Skip to content

Latest commit

 

History

History
204 lines (143 loc) · 9.28 KB

quick-start.md

File metadata and controls

204 lines (143 loc) · 9.28 KB

Quick Start

In this page, we will guide you through the steps to:

  1. Install Numaflow.
  2. Create and run a simple pipeline.
  3. Create and run an advanced pipeline.

Before you begin: prerequisites

To try Numaflow, you will first need to setup using one of the following options to run container images:

Then use one of the following options to create a local Kubernete Cluster:

You will also need kubectl to manage the cluster. Follow these steps to install kubectl. In case you need a refresher, all the kubectl commands used in this quick start guide can be found in the kubectl Cheat Sheet.

Installing Numaflow

Once you have completed all the prerequisites, run the following command lines to install Numaflow and start the Inter-Step Buffer Service that handles communication between vertices.

kubectl create ns numaflow-system
kubectl apply -n numaflow-system -f https://raw.githubusercontent.com/numaproj/numaflow/main/config/install.yaml
kubectl apply -f https://raw.githubusercontent.com/numaproj/numaflow/main/examples/0-isbsvc-jetstream.yaml

Creating a simple pipeline

As an example, we will create a simple pipeline that contains a source vertex to generate messages, a processing vertex that echos the messages, and a sink vertex that logs the messages.

Run the command below to create a simple pipeline.

kubectl apply -f https://raw.githubusercontent.com/numaproj/numaflow/main/examples/1-simple-pipeline.yaml

To view a list of pipelines you've created, run:

kubectl get pipeline # or "pl" as a short name

This should create a response like the following, with AGE indicating the time elapsed since the creation of your simple pipeline.

NAME              PHASE     MESSAGE   VERTICES   AGE
simple-pipeline   Running             3          9s

To inspect the status of the pipeline, use kubectl get pods. Note that the pod names will be different from the sample response:

# Wait for pods to be ready
kubectl get pods

NAME                                         READY   STATUS      RESTARTS   AGE
isbsvc-default-js-0                          3/3     Running     0          19s
isbsvc-default-js-1                          3/3     Running     0          19s
isbsvc-default-js-2                          3/3     Running     0          19s
simple-pipeline-daemon-78b798fb98-qf4t4      1/1     Running     0          10s
simple-pipeline-out-0-xc0pf                  1/1     Running     0          10s
simple-pipeline-cat-0-kqrhy                  2/2     Running     0          10s
simple-pipeline-in-0-rhpjm                   1/1     Running     0          11s

Now you can watch the log for the output vertex. Run the command below and remember to replace xxxxx with the appropriate pod name above.

kubectl logs -f simple-pipeline-out-0-xxxxx

This should generate an output like the sample below:

2022/08/25 23:59:38 (out) {"Data":"VT+G+/W7Dhc=","Createdts":1661471977707552597}
2022/08/25 23:59:38 (out) {"Data":"0TaH+/W7Dhc=","Createdts":1661471977707615953}
2022/08/25 23:59:38 (out) {"Data":"EEGH+/W7Dhc=","Createdts":1661471977707618576}
2022/08/25 23:59:38 (out) {"Data":"WESH+/W7Dhc=","Createdts":1661471977707619416}
2022/08/25 23:59:38 (out) {"Data":"YEaH+/W7Dhc=","Createdts":1661471977707619936}
2022/08/25 23:59:39 (out) {"Data":"qfomN/a7Dhc=","Createdts":1661471978707942057}
2022/08/25 23:59:39 (out) {"Data":"aUcnN/a7Dhc=","Createdts":1661471978707961705}
2022/08/25 23:59:39 (out) {"Data":"iUonN/a7Dhc=","Createdts":1661471978707962505}
2022/08/25 23:59:39 (out) {"Data":"mkwnN/a7Dhc=","Createdts":1661471978707963034}
2022/08/25 23:59:39 (out) {"Data":"jk4nN/a7Dhc=","Createdts":1661471978707963534}

Numaflow also comes with a built-in user interface.

NOTE: Please install the metrics server if your local Kubernetes cluster does not bring it by default (e.g., Kind). You can install it by running the below command.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl patch -n kube-system deployment metrics-server --type=json -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--kubelet-insecure-tls"}]'

To port forward the UI, run the following command.

# Port forward the UI to https://localhost:8443/
kubectl -n numaflow-system port-forward deployment/numaflow-server 8443:8443

This renders the following UI on https://localhost:8443/.

Numaflow UI

The pipeline can be deleted by issuing the following command:

kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/main/examples/1-simple-pipeline.yaml

Creating an advanced pipeline

Now we will walk you through creating an advanced pipeline. In our example, this is called the even-odd pipeline, illustrated by the following diagram:

Pipeline Diagram

There are five vertices in this example of an advanced pipeline. An HTTP source vertex which serves an HTTP endpoint to receive numbers as source data, a UDF vertex to tag the ingested numbers with the key even or odd, three Log sinks, one to print the even numbers, one to print the odd numbers, and the other one to print both the even and odd numbers.

Run the following command to create the even-odd pipeline.

kubectl apply -f https://raw.githubusercontent.com/numaproj/numaflow/main/examples/2-even-odd-pipeline.yaml

You may opt to view the list of pipelines you've created so far by running kubectl get pipeline. Otherwise, proceed to inspect the status of the pipeline, using kubectl get pods.

# Wait for pods to be ready
kubectl get pods

NAME                               READY   STATUS    RESTARTS   AGE
even-odd-daemon-64d65c945d-vjs9f   1/1     Running   0          5m3s
even-odd-even-or-odd-0-pr4ze       2/2     Running   0          30s
even-odd-even-sink-0-unffo         1/1     Running   0          22s
even-odd-in-0-a7iyd                1/1     Running   0          5m3s
even-odd-number-sink-0-zmg2p       1/1     Running   0          7s
even-odd-odd-sink-0-2736r          1/1     Running   0          15s
isbsvc-default-js-0                3/3     Running   0          10m
isbsvc-default-js-1                3/3     Running   0          10m
isbsvc-default-js-2                3/3     Running   0          10m

Next, port-forward the HTTP endpoint, and make a POST request using curl. Remember to replace xxxxx with the appropriate pod names both here and in the next step.

kubectl port-forward even-odd-in-0-xxxx 8444:8443

# Post data to the HTTP endpoint
curl -kq -X POST -d "101" https://localhost:8444/vertices/in
curl -kq -X POST -d "102" https://localhost:8444/vertices/in
curl -kq -X POST -d "103" https://localhost:8444/vertices/in
curl -kq -X POST -d "104" https://localhost:8444/vertices/in

Now you can watch the log for the even and odd vertices by running the commands below.

# Watch the log for the even vertex
kubectl logs -f even-odd-even-sink-0-xxxxx
2022/09/07 22:29:40 (even-sink) 102
2022/09/07 22:29:40 (even-sink) 104

# Watch the log for the odd vertex
kubectl logs -f even-odd-odd-sink-0-xxxxx
2022/09/07 22:30:19 (odd-sink) 101
2022/09/07 22:30:19 (odd-sink) 103

View the UI for the advanced pipeline at https://localhost:8443/.

Numaflow UI

The source code of the even-odd user-defined function can be found here. You also can replace the Log Sink with some other sinks like Kafka to forward the data to Kafka topics.

The pipeline can be deleted by

kubectl delete -f https://raw.githubusercontent.com/numaproj/numaflow/main/examples/2-even-odd-pipeline.yaml

A pipeline with reduce (aggregation)

To set up an example pipeline with the Reduce UDF, see Reduce Examples.

What's Next

Try more examples in the examples directory.

After exploring how Numaflow pipelines run, you can check what data Sources and Sinks Numaflow supports out of the box, or learn how to write User-defined Functions.

Numaflow can also be paired with Numalogic, a collection of ML models and algorithms for real-time data analytics and AIOps including anomaly detection. Visit the Numalogic homepage for more information.