Add some scipts to test fio and dfs llm cache on kubernetes cluster. (#…

…1883) Fixes #1879 Signed-off-by: Ye Cao <[email protected]>
v6d-io · May 16, 2024 · 397d274 · 397d274
1 parent 44458b4
commit 397d274
Show file tree

Hide file tree

Showing 13 changed files with 864 additions and 0 deletions.
diff --git a/modules/llm-cache/tests/k8s-test/Dockerfile.master b/modules/llm-cache/tests/k8s-test/Dockerfile.master
@@ -0,0 +1,9 @@
+FROM python:3.10
+
+WORKDIR /
+
+COPY master.py /master.py
+
+RUN pip3 install kubernetes
+
+CMD ["python3", "master.py"]
diff --git a/modules/llm-cache/tests/k8s-test/Dockerfile.worker b/modules/llm-cache/tests/k8s-test/Dockerfile.worker
@@ -0,0 +1,17 @@
+FROM ghcr.io/v6d-io/v6d/vineyard-python-dev:latest_x86_64 as builder
+
+FROM python:3.10
+
+WORKDIR /
+
+COPY worker.py /worker.py
+COPY --from=builder /tmp/vineyard_llm-0.22.1-py3-none-any.whl vineyard_llm-0.22.1-py3-none-any.whl
+
+RUN apt update && \
+    apt install fio -y
+
+RUN pip3 install vineyard /vineyard_llm-0.22.1-py3-none-any.whl && \
+    pip3 install networkx==3.1 && \
+    pip3 install numpy
+
+CMD ["python3", "worker.py"]
diff --git a/modules/llm-cache/tests/k8s-test/Makefile b/modules/llm-cache/tests/k8s-test/Makefile
@@ -0,0 +1,7 @@
+registry = registry.cn-wulanchabu.aliyuncs.com/vineyard
+build-images:
+	docker build -t ${registry}/fs-llm-master:latest -f ./Dockerfile.master .
+	docker build -t ${registry}/fs-llm-worker:latest -f ./Dockerfile.worker .
+push-images:
+	docker push ${registry}/fs-llm-master:latest
+	docker push ${registry}/fs-llm-worker:latest
diff --git a/modules/llm-cache/tests/k8s-test/README.md b/modules/llm-cache/tests/k8s-test/README.md
@@ -0,0 +1,91 @@
+## Run llm test on k8s
+
+This document describes how to run the llm test on a Kubernetes cluster.
+
+### Tokenize the prompt file
+
+Suppose you have a [prompt file](./prompt-samples.txt) that contains the conversation info between the user and the chatbot. You can tokenize the prompt file by running the following command:
+
+```bash
+$ python tokenize_prompt.py --prompt-file prompt-samples.txt --file-num 1
+```
+
+After running the command, you will get a tokenized prompt file named `tokens_0` under the `small_files` directory.
+
+```bash
+$ ls small_files 
+prompts_0.txt  tokens_0
+```
+
+Also, you could set the `--file-num` to the number of files you want to split the prompt file into. If the prompt file is too large, you can split it into multiple files. Each file will be processed in parallel.
+
+```
+$ python tokenize_prompt.py --prompt-file prompt-samples.txt --file-num 2
+$ ls small_files
+prompts_0.txt  prompts_1.txt  tokens_0  tokens_1
+```
+
+At this point, you can put these token files to the OSS bucket or NAS refer to the [ossutil upload files](https://help.aliyun.com/zh/oss/user-guide/upload-objects-to-oss/?spm=a2c4g.11186623.0.0.4b471c22sHG1EG) or [nas mount](https://help.aliyun.com/zh/nas/user-guide/mount-an-nfs-file-system-on-a-linux-ecs-instance?spm=a2c4g.11186623.0.0.15713eedDgiEYF).
+
+### Build the master and worker images
+
+Before building the master and worker images, you need to build the vineyard-python-dev image first, as we need the llm-cache pypi package.
+
+```bash
+$ cd v6d && make -C docker vineyard-python-dev
+```
+
+Then, you can build the master and worker images by running the following command:
+
+>  Make sure the image registry is set correctly.
+
+```bash
+$ cd modules/llm-cache/tests/k8s-test
+$ make build-images
+```
+
+Next, push the images to the registry:
+
+```bash
+$ make push-images
+```
+
+### Deploy on the k8s cluster
+
+#### Create the OSS volume
+
+Assume we have put the token files to the OSS bucket, we need to [create the oss secret and oss volume](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/mount-statically-provisioned-oss-volumes#title-hos-c75-12q) first.
+
+#### Create the Distributed FileSystem Volume
+
+The DFS could be NAS or CPFS, you could refer to the [Mount Nas Volume on ACK](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/mount-statically-provisioned-nas-volumes?spm=a2c4g.11186623.0.0.b7c130b7eJHcnf) or [Mount CPFS Volume on ACK](https://help.aliyun.com/zh/ack/ack-managed-and-ack-dedicated/user-guide/statically-provisioned-cpfs-2-0-volumes-1?spm=a2c4g.11186623.0.0.399a22dbapWWsP) to create the volume.
+
+#### Deploy the worker
+
+After preparing the OSS volume, and DFS volume, you need change the NFS volume name `nas-csi-pvc` to the DFS volume you created before.
+
+> ** Note: ** The CPU resources is important for the performance of worker, you could adjust the `resources.requests.cpu` to get better performance.
+
+Then deploy the worker by running the following command:
+
+```bash
+$ kubectl apply -f yamls/worker.yaml
+```
+
+#### Deploy the master
+
+After deploying the worker, you need to change `TOKENS_FILE_NUM` environment variable in the `yamls/master.yaml` file to the number of token files you put in the OSS bucket. Also, the OSS VolumeClaimName `oss-pvc` should be set to the OSS volume you created.
+
+Then deploy the master by running the following command:
+
+```bash
+$ kubectl apply -f yamls/master.yaml
+```
+
+### Show the result
+
+After running the llm test, you can check the result by running the following command:
+
+```bash
+$ python show_result.py --kubeconfig-path /your/kubeconfig --label-selector your_label_key=your_label_value
+```
diff --git a/modules/llm-cache/tests/k8s-test/master.py b/modules/llm-cache/tests/k8s-test/master.py
@@ -0,0 +1,51 @@
+import socket
+import random
+import os
+import time
+from multiprocessing import Pool
+from kubernetes import client, config
+
+def get_pod_ips(label_selector):
+    config.load_incluster_config()
+    api = client.CoreV1Api()
+    pods = api.list_pod_for_all_namespaces(label_selector=label_selector)
+    pod_ip_list = []
+    for pod in pods.items:
+        pod_ip_list.append(pod.status.pod_ip)
+    return pod_ip_list
+
+def distribute_prompts(args):
+    file_name, server_ips = args
+    token_list = []
+    with open(f'{file_name}', 'r', encoding='utf-8') as f:
+        while True:
+            line = f.readline()
+            if not line:
+                break
+            token_list.append(line)
+
+    for token in token_list:
+        server_ip = random.choice(server_ips)
+        #time.sleep(random.randint(1, 200)/1000000)
+        while True:
+            try:
+                send_tokens_to_server(server_ip, 8888, token)
+                break
+            except Exception as e:
+                print(f"Error: {e}")
+                time.sleep(1)
+                continue
+
+def send_tokens_to_server(server_address, server_port, tokens):
+    clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+    clientsocket.connect((server_address, server_port))
+    clientsocket.send(tokens.encode('utf-8'))
+    clientsocket.close()
+
+if __name__ == "__main__":
+    file_num = int(os.environ.get('TOKENS_FILE_NUM', 16))
+    file_names = [f'/tokens/tokens_{i}' for i in range(file_num)]
+    pod_selector = os.environ.get('POD_SELECTOR', 'app=fs-llm-test-worker')
+    server_ips = get_pod_ips(pod_selector)
+    with Pool(file_num) as p:
+        p.map(distribute_prompts, [(file_name, server_ips) for file_name in file_names])