Skip to content

Commit

Permalink
Add containers workflow with DeepSpeed MII container
Browse files Browse the repository at this point in the history
  • Loading branch information
richiejp committed Apr 16, 2024
1 parent d1234e9 commit b4eff3a
Show file tree
Hide file tree
Showing 5 changed files with 173 additions and 0 deletions.
67 changes: 67 additions & 0 deletions .github/workflows/containers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: Docker Image

on:
push:
tags:
- '*/v*'
pull_request:
branches:
- 'main'
env:
IMAGE_DOMAIN: premai
jobs:
build-and-push:
runs-on: Ubuntu-22.04
strategy:
fail-fast: false
matrix:
# ADD NEW Dockerfile directories HERE!!!
contexts: [deepspeed-mii]
steps:
- name: Install dependencies
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
run: |
sudo apt-get update \
&& sudo apt-get install -y software-properties-common \
&& sudo apt-get update \
&& sudo add-apt-repository -y ppa:git-core/ppa \
&& sudo apt-get update \
&& sudo apt-get install -y git wget make curl
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
- uses: actions/setup-go@v4
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
with:
go-version: '>=1.17.0'
- name: Checkout repository
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
uses: actions/checkout@v4
- name: Get tags
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
run: git fetch --tags origin --force
- name: Log in to the Container registry
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Extract metadata (tags, labels) for Docker
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.IMAGE_DOMAIN }}/${{ matrix.contexts }}
tags: |
type=ref,event=branch,suffix=-{{date 'YYYYMMDDHHmmss'}}
type=sha,suffix=-{{date 'YYYYMMDDHHmmss'}}
type=match,pattern=${{ matrix.contexts }}/v(\d+.\d+.\d+),group=1
flavor: |
prefix=
suffix=
- name: Build and push Docker image
if: startsWith(github.ref, format('refs/tags/{0}/v', matrix.contexts))
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: ./containers/${{ matrix.contexts }}
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
From f5ef85e1ef2189bc04c0b7ae147b147651376885 Mon Sep 17 00:00:00 2001
From: Richard Palethorpe <[email protected]>
Date: Fri, 26 Jan 2024 12:50:37 +0000
Subject: [PATCH] Add kubernetes health check route to REST server

---
mii/grpc_related/restful_gateway.py | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mii/grpc_related/restful_gateway.py b/mii/grpc_related/restful_gateway.py
index a5f1692..5b93fea 100644
--- a/mii/grpc_related/restful_gateway.py
+++ b/mii/grpc_related/restful_gateway.py
@@ -40,6 +40,10 @@ def createRestfulGatewayApp(deployment_name, server_thread):
threading.Thread(target=shutdown, args=(server_thread, )).start()
return "Shutting down RESTful API gateway server"

+ @app.route("/healthz", methods=["GET"])
+ def healthz():
+ return "ok"
+
api = Api(app)
path = "/{}/{}".format(RESTFUL_API_PATH, deployment_name)
api.add_resource(RestfulGatewayService, path)
--
2.42.0

Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
From 03d35279f22ca6b5efa8a1d8f5a001f5896f238a Mon Sep 17 00:00:00 2001
From: Richard Palethorpe <[email protected]>
Date: Fri, 26 Jan 2024 09:08:07 +0000
Subject: [PATCH] cuda: Use MIG device handle to get stats

If we are using MIG then we don't have permission to access the
overall device stats. We also don't care about the overall free
memory.

This uses the UUID passed to the container, which is the MIG device
when MIG is enabled and the GPU when not, to get the correct type of
handle.
---
accelerator/cuda_accelerator.py | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/accelerator/cuda_accelerator.py b/accelerator/cuda_accelerator.py
index 2d74daec..59c6d5e6 100644
--- a/accelerator/cuda_accelerator.py
+++ b/accelerator/cuda_accelerator.py
@@ -170,10 +170,9 @@ class CUDA_Accelerator(DeepSpeedAccelerator):

def available_memory(self, device_index=None):
if pynvml:
- if device_index is None:
- device_index = self.current_device()
- handle = pynvml.nvmlDeviceGetHandleByIndex(self._get_nvml_gpu_id(device_index))
- info = pynvml.nvmlDeviceGetMemoryInfo(handle)
+ uuid = os.environ['NVIDIA_VISIBLE_DEVICES']
+ mhandle = pynvml.nvmlDeviceGetHandleByUUID(uuid)
+ info = pynvml.nvmlDeviceGetMemoryInfo(mhandle)
return info.free
else:
return self.total_memory(device_index) - self.memory_allocated(device_index)
--
2.42.0

25 changes: 25 additions & 0 deletions containers/deepspeed-mii/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Taken from https://github.com/microsoft/DeepSpeed/blob/master/docker/Dockerfile
FROM nvidia/cuda:12.2.2-devel-ubuntu20.04

ENV DEBIAN_FRONTEND=noninteractive

# Taken from https://www.notion.so/premai/Mamba-Deployment-Image-62731eb173f8452ebee8c9e19a0cef1d?pvs=4
RUN apt update && apt install python3 python3-pip -y

# Taken from DeepSpeed Dockerfile
RUN pip install torch==2.1.2
# DeepSpeed plus precompiled kernels (I hope)
RUN pip install deepspeed-mii==0.2.0 deepspeed-kernels

# Patch DeepSpeed to work with MIG
COPY ./0001-cuda-Use-MIG-device-handle-to-get-stats.patch /tmp/0001.patch
RUN cd /usr/local/lib/python3.8/dist-packages/deepspeed && patch -p1 < /tmp/0001.patch

# Patch DeepSpeed-MII
COPY ./0001-Add-kubernetes-health-check-route-to-REST-server.patch /tmp/0001.patch
RUN cd /usr/local/lib/python3.8/dist-packages && patch -p1 < /tmp/0001.patch

COPY ./serve-rest-api.py ./serve-rest-api.py

ENTRYPOINT [ "python3", "serve-rest-api.py" ]
CMD ["--uri", "microsoft/phi-1_5"]
17 changes: 17 additions & 0 deletions containers/deepspeed-mii/serve-rest-api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import mii
import time
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--uri', required=True, help='Model URI e.g. microsoft/phi-1_5')

args = parser.parse_args()

client = mii.serve(args.uri,
deployment_name="default",
enable_restful_api=True,
restful_api_port=8080,
restful_api_host="0.0.0.0")

while True:
time.sleep(1000)

0 comments on commit b4eff3a

Please sign in to comment.