Skip to content

Commit

Permalink
Add documentation for testing on Kubernetes (#23)
Browse files Browse the repository at this point in the history
* add Dockerfile and docs, modify example

* doc fixes
  • Loading branch information
vakarisbk authored Oct 7, 2024
1 parent a84eb1c commit 880544a
Show file tree
Hide file tree
Showing 3 changed files with 106 additions and 4 deletions.
71 changes: 71 additions & 0 deletions docs/testing-on-k8s.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Testing in Kubernetes

This guide explains how to test DataFusion Ray on Kubernetes during development. It assumes you have an existing Kubernetes cluster.

## 1. Deploy the KubeRay Operator

To manage Ray clusters, you need to deploy the KubeRay operator using Helm. This step is required once per Kubernetes cluster.

```shell
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update

# Install the Custom Resource Definitions (CRDs) and KubeRay operator
helm install kuberay-operator kuberay/kuberay-operator

# Verify that the operator is running in the `default` namespace.
kubectl get pods

# Example output:
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 27s
```

You can customize the operator's settings (e.g., resource limits and requests). For basic testing, the default configuration should suffice.
For more details and customization options, refer to the [KubeRay Helm Chart documentation](https://github.com/ray-project/kuberay-helm/tree/main/helm-chart/kuberay-operator).

## 2. Build a Custom Docker Image
You need to build a custom Docker image containing your local development copy of DataFusion Ray rather than using the default PyPi release.

Run the following command to build your Docker image:

```shell
docker build -t [YOUR_IMAGE_NAME]:[YOUR_TAG] -f k8s/Dockerfile .
```
After building the image, push it to a container registry accessible by your Kubernetes cluster.

## 3. Deploy a RayCluster
Next, deploy a RayCluster using the custom image.

```shell
helm repo update
helm install datafusion-ray kuberay/ray-cluster \
--set 'image.repository=[YOUR_REPOSITORY]' \
--set 'image.tag=[YOUR_TAG]' \
--set 'imagePullPolicy=Always'
```
Make sure you replace *[YOUR_REPOSITORY]* and *[YOUR_TAG]* with your actual container registry and image tag values.

You can further customize RayCluster settings (such as resource allocations, autoscaling, and more).
For full configuration options, refer to the [RayCluster Helm Chart documentation](https://github.com/ray-project/kuberay-helm/tree/main/helm-chart/ray-cluster).

## 4. Port Forwarding

To access Ray's dashboard, set up port forwarding between your local machine and the Ray cluster's head node:

```shell
kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
```

This makes Ray’s dashboard and API available at `http://127.0.0.1:8265`.


## 5. Run an Example
From the examples directory in your project, you can run a sample job using the following commands:

```
export RAY_ADDRESS="http://127.0.0.1:8265"
ray job submit --working-dir ./examples/ -- python3 tips.py
```

### Expected output:
5 changes: 1 addition & 4 deletions examples/tips.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,8 @@

SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))

# Start a local cluster
ray.init(resources={"worker": 1})

# Connect to a cluster
# ray.init()
ray.init()

# Create a context and register a table
ctx = DatafusionRayContext(2)
Expand Down
34 changes: 34 additions & 0 deletions k8s/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM rayproject/ray:2.37.0.cabc24-py312

RUN sudo apt update && \
sudo apt install -y curl build-essential

# Intall Rust
RUN curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain stable -y

WORKDIR /home/ray

# install dependencies
COPY requirements-in.txt /home/ray/
RUN python3 -m venv venv && \
source venv/bin/activate && \
pip3 install -r requirements-in.txt

# add sources
RUN mkdir /home/ray/src
RUN mkdir /home/ray/datafusion_ray
COPY src /home/ray/src/
COPY datafusion_ray /home/ray/datafusion_ray/
COPY pyproject.toml /home/ray/
COPY Cargo.* /home/ray/
COPY build.rs /home/ray/
COPY README.md /home/ray/

# build datafusion_ray
RUN source venv/bin/activate && \
source /home/ray/.cargo/env && \
maturin build --release

FROM rayproject/ray:2.37.0.cabc24-py312
COPY --from=0 /home/ray/target/wheels/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl /home/ray/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl
RUN pip3 install /home/ray/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl

0 comments on commit 880544a

Please sign in to comment.