Skip to content

Commit

Permalink
chore(components): Added kfp_deploy_model_to_kserve_demo
Browse files Browse the repository at this point in the history
Signed-off-by: Helber Belmiro <[email protected]>

docs(backend): improved backend README (kubeflow#11511)

* improved backend README

Signed-off-by: Daniel Dowler <[email protected]>

* Update backend/README.md

Co-authored-by: Helber Belmiro <[email protected]>
Signed-off-by: Daniel Dowler <[email protected]>

* Update backend/README.md

Co-authored-by: Helber Belmiro <[email protected]>
Signed-off-by: Daniel Dowler <[email protected]>

* Update backend/README.md

Co-authored-by: Helber Belmiro <[email protected]>
Signed-off-by: Daniel Dowler <[email protected]>

* Update backend/README.md

Co-authored-by: Helber Belmiro <[email protected]>
Signed-off-by: Daniel Dowler <[email protected]>

---------

Signed-off-by: Daniel Dowler <[email protected]>
Co-authored-by: Helber Belmiro <[email protected]>

fix(CI): Use the correct image registry for replacements in integration tests (kubeflow#11564)

* Use the correct image registry for replacements in integration tests

The image registry was changed to GitHub Container Registry in the 2.4
release.

Signed-off-by: mprahl <[email protected]>

* Print the pod logs when the pods fail to start in integration tests

Signed-off-by: mprahl <[email protected]>

* Fix the sample compilation in the API server container build

Signed-off-by: mprahl <[email protected]>

* Show the output when building the container images in CI

Signed-off-by: mprahl <[email protected]>

---------

Signed-off-by: mprahl <[email protected]>

feat(api): Add SemaphoreKey and MutexName fields to proto (kubeflow#11384)

Signed-off-by: ddalvi <[email protected]>
  • Loading branch information
hbelmiro authored and mholder6 committed Jan 31, 2025
1 parent 1234c8d commit e71727d
Show file tree
Hide file tree
Showing 21 changed files with 1,193 additions and 62 deletions.
6 changes: 3 additions & 3 deletions .github/resources/manifests/argo/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ resources:
- ../../../../manifests/kustomize/env/platform-agnostic

images:
- name: gcr.io/ml-pipeline/api-server
- name: ghcr.io/kubeflow/kfp-api-server
newName: kind-registry:5000/apiserver
newTag: latest
- name: gcr.io/ml-pipeline/persistenceagent
- name: ghcr.io/kubeflow/kfp-persistence-agent
newName: kind-registry:5000/persistenceagent
newTag: latest
- name: gcr.io/ml-pipeline/scheduledworkflow
- name: ghcr.io/kubeflow/kfp-scheduled-workflow-controller
newName: kind-registry:5000/scheduledworkflow
newTag: latest

Expand Down
6 changes: 3 additions & 3 deletions .github/resources/manifests/tekton/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ resources:
# when application is deleted.

images:
- name: gcr.io/ml-pipeline/api-server
- name: ghcr.io/kubeflow/kfp-api-server
newName: kind-registry:5000/apiserver
newTag: latest
- name: gcr.io/ml-pipeline/persistenceagent
- name: ghcr.io/kubeflow/kfp-persistence-agent
newName: kind-registry:5000/persistenceagent
newTag: latest
- name: gcr.io/ml-pipeline/scheduledworkflow
- name: ghcr.io/kubeflow/kfp-scheduled-workflow-controller
newName: kind-registry:5000/scheduledworkflow
newTag: latest
- name: '*/aipipeline/tekton-exithandler-controller'
Expand Down
10 changes: 5 additions & 5 deletions .github/resources/scripts/build-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,35 @@ EXIT_CODE=0

docker system prune -a -f

docker build -q -t "${REGISTRY}/apiserver:${TAG}" -f backend/Dockerfile . && docker push "${REGISTRY}/apiserver:${TAG}" || EXIT_CODE=$?
docker build --progress=plain -t "${REGISTRY}/apiserver:${TAG}" -f backend/Dockerfile . && docker push "${REGISTRY}/apiserver:${TAG}" || EXIT_CODE=$?
if [[ $EXIT_CODE -ne 0 ]]
then
echo "Failed to build apiserver image."
exit $EXIT_CODE
fi

docker build -q -t "${REGISTRY}/persistenceagent:${TAG}" -f backend/Dockerfile.persistenceagent . && docker push "${REGISTRY}/persistenceagent:${TAG}" || EXIT_CODE=$?
docker build --progress=plain -t "${REGISTRY}/persistenceagent:${TAG}" -f backend/Dockerfile.persistenceagent . && docker push "${REGISTRY}/persistenceagent:${TAG}" || EXIT_CODE=$?
if [[ $EXIT_CODE -ne 0 ]]
then
echo "Failed to build persistenceagent image."
exit $EXIT_CODE
fi

docker build -q -t "${REGISTRY}/scheduledworkflow:${TAG}" -f backend/Dockerfile.scheduledworkflow . && docker push "${REGISTRY}/scheduledworkflow:${TAG}" || EXIT_CODE=$?
docker build --progress=plain -t "${REGISTRY}/scheduledworkflow:${TAG}" -f backend/Dockerfile.scheduledworkflow . && docker push "${REGISTRY}/scheduledworkflow:${TAG}" || EXIT_CODE=$?
if [[ $EXIT_CODE -ne 0 ]]
then
echo "Failed to build scheduledworkflow image."
exit $EXIT_CODE
fi

docker build -q -t "${REGISTRY}/driver:${TAG}" -f backend/Dockerfile.driver . && docker push "${REGISTRY}/driver:${TAG}" || EXIT_CODE=$?
docker build --progress=plain -t "${REGISTRY}/driver:${TAG}" -f backend/Dockerfile.driver . && docker push "${REGISTRY}/driver:${TAG}" || EXIT_CODE=$?
if [[ $EXIT_CODE -ne 0 ]]
then
echo "Failed to build driver image."
exit $EXIT_CODE
fi

docker build -q -t "${REGISTRY}/launcher:${TAG}" -f backend/Dockerfile.launcher . && docker push "${REGISTRY}/launcher:${TAG}" || EXIT_CODE=$?
docker build --progress=plain -t "${REGISTRY}/launcher:${TAG}" -f backend/Dockerfile.launcher . && docker push "${REGISTRY}/launcher:${TAG}" || EXIT_CODE=$?
if [[ $EXIT_CODE -ne 0 ]]
then
echo "Failed to build launcher image."
Expand Down
13 changes: 13 additions & 0 deletions .github/resources/scripts/kfp-readiness/wait_for_pods.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,17 @@
config.load_kube_config()
v1 = client.CoreV1Api()

def log_pods():
pods = v1.list_namespaced_pod(namespace=namespace)

for pod in pods.items:
try:
logging.info(
f"---- Pod {namespace}/{pod.metadata.name} logs ----\n"
+ v1.read_namespaced_pod_log(pod.metadata.name, namespace)
)
except client.exceptions.ApiException:
continue

def get_pod_statuses():
pods = v1.list_namespaced_pod(namespace=namespace)
Expand Down Expand Up @@ -74,6 +85,8 @@ def check_pods(calm_time=10, timeout=600, retries_after_ready=5):
logging.info(f"Pods are still stabilizing. Retrying in {calm_time} seconds...")
time.sleep(calm_time)
else:
log_pods()

raise Exception("Pods did not stabilize within the timeout period.")

logging.info("Final pod statuses:")
Expand Down
36 changes: 30 additions & 6 deletions api/v2alpha1/go/pipelinespec/pipeline_spec.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion api/v2alpha1/pipeline_spec.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1106,5 +1106,9 @@ message PlatformDeploymentConfig {

// Spec for pipeline-level config options. See PipelineConfig DSL class.
message PipelineConfig {
// TODO add pipeline-level configs
// Name of the semaphore key to control pipeline concurrency
string semaphore_key = 1;

// Name of the mutex to ensure mutual exclusion
string mutex_name = 2;
}
4 changes: 2 additions & 2 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,9 @@ COPY backend/src/apiserver/config/sample_config.json /samples/
# Compiling the preloaded samples.
# The default image is replaced with the GCR-hosted python image.
RUN set -e; \
< /samples/sample_config.json jq .[].file --raw-output | while read pipeline_yaml; do \
< /samples/sample_config.json jq ".pipelines[].file" --raw-output | while read pipeline_yaml; do \
pipeline_py="${pipeline_yaml%.yaml}"; \
python3 "$pipeline_py"; \
echo "Compiling: \"$pipeline_py\"" && python3 "$pipeline_py" && echo -n "Output: " && ls "$pipeline_py.yaml"; \
done

# 3. Start api web server
Expand Down
94 changes: 52 additions & 42 deletions backend/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Kubeflow Pipelines Backend

## Overview

This directory contains code for the components that comprise the Kubeflow
Pipelines backend.

This README will help you set up your coding environment in order to build and run the Kubeflow Pipelines backend. The KFP backend powers the core functionality of the KFP platform, handling API requests, workflow management, and data persistence.

## Prerequisites
Before you begin, ensure you have:
- Go programming language installed
- [go-licenses tool](../hack/install-go-licenses.sh)
- Docker or Podman installed (for building container images)

Note that you may need to restart your shell after installing these resources in order for the changes to take effect.

## Building & Testing

To run all unittests for backend:
Expand All @@ -15,64 +29,46 @@ The API server itself can be built using:
go build -o /tmp/apiserver backend/src/apiserver/*.go
```

## Code Style

Backend codebase follows the [Google's Go Style Guide](https://google.github.io/styleguide/go/). Please, take time to get familiar with the [best practices](https://google.github.io/styleguide/go/best-practices). It is not intended to be exhaustive, but it often helps minimizing guesswork among developers and keep codebase uniform and consistent.

We use [golangci-lint](https://golangci-lint.run/) tool that can catch common mistakes locally (see detailed configuration [here](https://github.com/kubeflow/pipelines/blob/master/.golangci.yaml)). It can be [conveniently integrated](https://golangci-lint.run/usage/integrations/) with multiple popular IDEs such as VS Code or Vim.

Finally, it is advised to install [pre-commit](https://pre-commit.com/) in order to automate linter checks (see configuration [here](https://github.com/kubeflow/pipelines/blob/master/.pre-commit-config.yaml))

## Building APIServer image locally

The API server image can be built from the root folder of the repo using:
```
export API_SERVER_IMAGE=api_server
docker build -f backend/Dockerfile . --tag $API_SERVER_IMAGE
```
## Deploy APIServer with the image you own build
### Deploying the APIServer (from the image you built) on Kubernetes

Run
First, push your image to a registry that is accessible from your Kubernetes cluster.

Then, run:
```
kubectl edit deployment.v1.apps/ml-pipeline -n kubeflow
```
You'll see the field reference the api server docker image.
You'll see the field reference the api server container image (`spec.containers[0].image: gcr.io/ml-pipeline/api-server:<image-version>`).
Change it to point to your own build, after saving and closing the file, apiserver will restart with your change.

## Building client library and swagger files
### Building client library and swagger files

After making changes to proto files, the Go client libraries, Python client libraries and swagger files
need to be regenerated and checked-in. Refer to [backend/api](./api/README.md) for details.

## Updating licenses info

1. [Install go-licenses tool](../hack/install-go-licenses.sh) and refer to [its documentation](https://github.com/google/go-licenses) for how to use it.
### Updating licenses info

1. [Install go-licenses tool](../hack/install-go-licenses.sh) (if you haven't already) and refer to [its documentation](https://github.com/google/go-licenses) for how to use it.

2. Run the tool to update all licenses:

```bash
make all
make -C backend all
```

## Updating python dependencies

[pip-tools](https://github.com/jazzband/pip-tools) is used to manage python
dependencies. To update dependencies, edit [requirements.in](requirements.in)
and run `./update_requirements.sh` to update and pin the transitive
dependencies.

# Visualization Server Instructions

## Updating python dependencies
### Updating python dependencies

[pip-tools](https://github.com/jazzband/pip-tools) is used to manage python
dependencies. To update dependencies, edit [requirements.in](requirements.in)
and run `./update_requirements.sh` to update and pin the transitive
dependencies.


## Building conformance tests (WIP)
### Building conformance tests (WIP)

Run
```
Expand All @@ -81,7 +77,7 @@ docker build . -f backend/Dockerfile.conformance -t <tag>
## API Server Development
### Run Locally With a Kind Cluster
### Run the KFP Backend Locally With a Kind Cluster
This deploys a local Kubernetes cluster leveraging [kind](https://kind.sigs.k8s.io/), with all the components required
to run the Kubeflow Pipelines API server. Note that the `ml-pipeline` `Deployment` (API server) has its replicas set to
Expand All @@ -99,6 +95,7 @@ pods on the cluster using the `ml-pipeline` `Service`.
network interface through Docker/Podman Desktop. See
[kind #1200](https://github.com/kubernetes-sigs/kind/issues/1200#issuecomment-1304855791) for an example manifest.
* Optional: VSCode is installed to leverage a sample `launch.json` file.
* This relies on dlv: (go install -v github.com/go-delve/delve/cmd/dlv@latest)
#### Provisioning the Cluster
Expand All @@ -111,15 +108,9 @@ make -C backend dev-kind-cluster
This may take several minutes since there are many pods. Note that many pods will be in "CrashLoopBackOff" status until
all the pods have started.

#### Deleting the Cluster

Run the following to delete the cluster:
Also, note that the config in the `make` command above sets the `ml-pipeline` `Deployment` (api server) to have 0 replicas. The intent is to replace it with a locally running API server for debugging and faster development. See the following steps to run the API server locally, and connect it to the KFP backend on your Kind cluster. Note that other backend components (for example, the persistence agent) may show errors until the API server is brought up and connected to the cluster.

```bash
kind delete clusters dev-pipelines-api
```

#### Launch the API Server With VSCode
#### Launching the API Server With VSCode

After the cluster is provisioned, you may leverage the following sample `.vscode/launch.json` file to run the API
server locally:
Expand Down Expand Up @@ -168,12 +159,12 @@ You can also directly connect to the MariaDB database server with:
mysql -h 127.0.0.1 -u root
```

## Remote Debug the Driver
### Remote Debug the Driver

These instructions assume you are leveraging the Kind cluster in the
[Run Locally With a Kind Cluster](#run-locally-with-a-kind-cluster) section.

### Build the Driver Image With Debug Prerequisites
#### Build the Driver Image With Debug Prerequisites

Run the following to create the `backend/Dockerfile.driver-debug` file and build the container image
tagged as `kfp-driver:debug`. This container image is based on `backend/Dockerfile.driver` but installs
Expand All @@ -197,7 +188,7 @@ Alternatively, you can use this Make target that does both.
make -C kind-build-and-load-driver-debug
```

### Run the API Server With Debug Configuration
#### Run the API Server With Debug Configuration

You may use the following VS Code `launch.json` file to run the API server which overrides the Driver
command to use Delve and the Driver image to use debug image built previously.
Expand Down Expand Up @@ -229,7 +220,7 @@ command to use Delve and the Driver image to use debug image built previously.
}
```

### Starting a Remote Debug Session
#### Starting a Remote Debug Session

Start by launching a pipeline. This will eventually create a Driver pod that is waiting for a remote debug connection.

Expand Down Expand Up @@ -273,3 +264,22 @@ For debugging a specific Driver pod, you'll need to continuously port forward an
without a breakpoint so that Delve will continue execution until the Driver pod you are interested in starts up. At that
point, you can set a break point, port forward, and connect to the remote debug session to debug that specific Driver
pod.

### Deleting the Kind Cluster

Run the following to delete the cluster (once you are finished):

```bash
kind delete clusters dev-pipelines-api
```

## Contributing
### Code Style
Backend codebase follows the [Google's Go Style Guide](https://google.github.io/styleguide/go/). Please, take time to get familiar with the [best practices](https://google.github.io/styleguide/go/best-practices). It is not intended to be exhaustive, but it often helps minimizing guesswork among developers and keep codebase uniform and consistent.

We use [golangci-lint](https://golangci-lint.run/) tool that can catch common mistakes locally (see detailed configuration [here](https://github.com/kubeflow/pipelines/blob/master/.golangci.yaml)). It can be [conveniently integrated](https://golangci-lint.run/usage/integrations/) with multiple popular IDEs such as VS Code or Vim.

Finally, it is advised to install [pre-commit](https://pre-commit.com/) in order to automate linter checks (see configuration [here](https://github.com/kubeflow/pipelines/blob/master/.pre-commit-config.yaml))


Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM python:3.9-slim-bullseye
RUN apt-get update && apt-get install -y gcc python3-dev

COPY requirements.txt .
RUN pip install --upgrade pip
RUN python3 -m pip install --upgrade -r \
requirements.txt --quiet --no-cache-dir \
&& rm -f requirements.txt

ENV APP_HOME /app
COPY kservedeployer.py $APP_HOME/kservedeployer.py
WORKDIR $APP_HOME

ENTRYPOINT ["python"]
CMD ["kservedeployer.py"]
Loading

0 comments on commit e71727d

Please sign in to comment.