Release of eval scripts

GregBowyer · GregBowyer · commit f0167250d0c2 · 2023-12-13T13:55:27.000-08:00
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,102 @@
+﻿# Eval Scripts
+
+## TL;DR
+
+These are the _very basic_ scripts are for running submissions of the competition. Scripts herein are very dumb, intentionally so. PRs welcome for better scripts!
+
+## Setup
+
+The following tools need to be setup on an evaluation machine.
+
+* `docker` and `nvidia-docker` for running the helm suite and the submissions
+* `git` for cloning repos
+* `curl` for running the healthchecks to ensure containers are running
+
+With these installed run the `setup.sh` script with the number of GPUs in the evaluation machine.
+
+```sh
+./setup.sh $NUM_GPUS
+```
+
+... for example to use 8 gpus in an eval machine
+
+```sh
+./setup.sh 8
+```
+
+### What this does
+This script will ensure a few tools are present and functional, as well as setting up a degree of isolation on the machine.
+This takes the form of docker networks and port exposures for each individual GPU.
+
+We also checkout the helm evaluation version used in the competition, into the `./private-helm` folder.
+
+## Running a submission
+Make sure the submissions are in `./submissions` folder relative to this script.
+
+To run a single submission do:
+
+```sh
+./eval-repo.sh         \
+    '$gpu_device'      \
+    '$isolation'       \
+    '$hardware_track'  \
+    '$helm_config'     \
+    '$submission'
+```
+
+What this does:
+
+* `'$gpu_device'` Specifies the GPU string to pass to docker for the GPU(s) to run the submission on
+* `'$isolation'` An isolation factor used to divide submissions between multiple GPUs on a single server. We recommend to keep this number the same as the `$gpu_device` string.
+* `'$helm_config'` The config used for helm, must be in a path visibile to the helm container. Private-helm contains configs within the container for the 111 competition.
+* `'$hardware_track'` The hardware track to build for in the submissions folder. Essentially a top level folder in `./submissions` that differentiates between different hardware tracks/
+* `'$submission'` The submission folder to build
+
+The `$submission` is the folder that contains the submission to evaluate. A bare `Dockerfile` is expected at this location to build the submissions container.
+
+### Layout of the submissions folder
+Submissions are laid out in `./submissions` folder in the following fashion:
+
+```
+./submissions
+├── 4090
+│   └── $user
+│       └── $repo
+│           ├── README.md
+│           ├── submission_1
+│           │   ├── Dockerfile
+│           │   └── ...
+│           └── submission_2
+│               ├── Dockerfile
+│               └── ...
+└── A100
+    └── $user
+        └── $repo
+            ├── README.md
+            ├── submission_1
+            │   ├── Dockerfile
+            │   └── ...
+            └── submission_2
+                ├── Dockerfile
+                └── ...
+```
+
+### Putting this together for a simple run
+
+Using the above layout to run a submission for A100 `$user` `$repo/submission_2` on the second GPU you would do the following:
+
+```sh
+./eval-repo.sh                           \
+    'device=1'                           \
+    '1'                                  \
+    'A100'                               \
+    '/helm/config/some_helm_config.conf' \
+    '$user/$repo/submission_2'
+```
+
+## Changing locations
+Should you want to change the locations of the `./private-helm`, `./submissions` or `./results` folders you will need to edit `utils.sh`
+
+* For `./submissions` change `$BASE_SUB_DIR` in [`utils.sh`](utils.sh)
+* For `./results` change `$OUT_DIR` in [`utils.sh`](utils.sh)
+* For `./private-helm` change [`build-eval-container.sh`](build-eval-container.sh)
diff --git a/build-eval-container.sh b/build-eval-container.sh
@@ -0,0 +1,4 @@
+#!/usr/bin/env bash
+
+cd private-helm || exit
+docker build -t llm-eval .
diff --git a/eval-repo.sh b/eval-repo.sh
@@ -0,0 +1,152 @@
+#!/usr/bin/env bash
+
+source ./utils.sh
+
+cleanup() {
+  local llm_eval_container
+  llm_eval_container="$(cat "$PID_DIR/llm_docker.name")"
+  docker stop "$llm_eval_container"
+
+  local sub_name
+  sub_name="$(cat "$PID_DIR/submission_docker.name")"
+  docker stop "$sub_name"
+
+  kill -15 "$(cat "$PID_DIR/submission_log.pid")"
+  sleep 10
+
+  docker rmi "$sub_name"
+}
+
+trap cleanup EXIT
+trap cleanup SIGINT
+
+submission_name() {
+  echo "${1//\//_}" | tr '-' '_' | tr '[:upper:]' '[:lower:]'
+}
+
+gaurentee_dirs() {
+  mkdir -p "$PID_DIR"
+  mkdir -p "$BASE_SUB_DIR"
+  mkdir -p "$OUT_DIR"
+}
+
+build_submission() {
+  local hardware_track="$1"
+  local submission="$2"
+
+  local sub_name
+  sub_name=$(submission_name "$submission")
+
+  enter "$BASE_SUB_DIR/$hardware_track/$submission"
+
+  docker build -t "$sub_name" . 2>&1     \
+    | tee "$OUT_DIR/$sub_name-build.log" \
+    || die "Could not build $sub_name"
+
+  leave
+}
+
+healthcheck() {
+  local port="$1"
+
+  local max_retries=10      # Maximum number of retries
+  local retry_delay=120     # Delay between retries in seconds
+
+  local url="http://localhost:$port/process"
+  local data='{"prompt": "The capital of France is "}'
+  local accept='Content-Type: application/json'
+
+  for ((i = 0; i < max_retries; i++)); do
+    sleep $retry_delay
+    if curl -q -X POST -H "$accept"  -d "$data" "$url" ; then
+      return 0
+    else
+      echo "Retrying healthcheck (Attempt $i)..."
+    fi
+  done
+
+  die "Could not healthcheck after $max_retries retries"
+}
+
+run_submission() {
+  local submission="$1"
+  local isolation="$2"
+  local gpus="$3"
+
+  local network
+  local sub_name
+
+  network="llm_eval_$isolation"
+  sub_name=$(submission_name "$submission")
+
+  local port="808$isolation"
+
+  docker run             \
+    -d                   \
+    --rm                 \
+    --name "$sub_name"   \
+    --network "$network" \
+    --runtime=nvidia     \
+    --gpus "$gpus"       \
+    -p "$port:80"        \
+    "$sub_name" || die "Could not run $sub_name"
+
+  echo "$sub_name" > "$PID_DIR/submission_docker.name"
+
+  ( docker logs -f "$sub_name" > "$OUT_DIR/$sub_name-run.log" 2>&1 ) > /dev/null &
+  echo "$!" > "$PID_DIR/submission_log.pid"
+
+  healthcheck "$port"
+}
+
+get_ip() {
+  docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$1"
+}
+
+run_helm() {
+  local submission="$1"
+  local isolation="$2"
+  local config="$3"
+
+  local sub_name
+  local ip
+  local llm_eval_name="llm_eval_${isolation}"
+
+  sub_name=$(submission_name "$submission")
+  ip="$(get_ip "$sub_name")"
+
+  echo "$llm_eval_name" > "$PID_DIR/llm_docker.name"
+
+  docker run                                    \
+    --rm                                        \
+    --name "$llm_eval_name"                     \
+    --env HELM_HTTP_MODEL_BASE_URL="http://$ip" \
+    --network "$llm_eval_name"                  \
+    -v "$OUT_DIR:/results"                      \
+    llm-eval                                    \
+    /helm/do-run.sh "$config" "$sub_name" || die "Could not run helm"
+}
+
+main() {
+  local gpus="$1"
+  local isolation="$2"
+  local hardware_track="$3"
+  local config="$4"
+  local submission="$5"
+
+  # Isolate for specific runs on multi-gpus
+  export PID_DIR="$EVAL_ROOT/state/$isolation"
+
+  if [[ $# != 5 ]]; then
+    echo "Usage $0: gpu-spec isolation hardware-track config repo submission"
+    exit 1
+  fi
+
+  gaurentee_dirs
+
+  build_submission "$hardware_track" "$submission"
+  run_submission "$submission" "$isolation" "$gpus" 
+  run_helm "$submission" "$isolation" "$config"
+}
+
+main "$@"
diff --git a/setup.sh b/setup.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+
+source ./utils.sh
+
+git_clone() {
+  git clone "$1"
+}
+
+setup_docker_network() {
+  local num_gpus="$1"
+  for isolation in $(seq 1 "$num_gpus"); do
+    if ! docker network inspect "llm_eval_$isolation" > /dev/null 2>&1 ; then
+      docker network create "llm_eval_$isolation" || die "Unable to create llm-eval docker network"
+    fi
+  done
+}
+
+main() {
+  if [[ $# -ne 1 ]]; then
+      die "Usage $0: [number-gpus]"
+  fi
+
+  check_cmd curl
+  check_cmd docker
+  check_cmd git
+
+  git_clone "git@github.com:llm-efficiency-challenge/private-helm.git"
+
+  enter private-helm
+  git checkout neurips_eval
+  leave
+
+  ./build-eval-container.sh || die "Cannot build the eval container"
+  setup_docker_network "$2"
+
+  echo "Make sure submissions are present in submissons"
+}
+
+main "$@"
diff --git a/utils.sh b/utils.sh
@@ -0,0 +1,36 @@
+#!/usr/bin/env bash
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+
+export EVAL_ROOT="$SCRIPT_DIR"
+export BASE_SUB_DIR="$EVAL_ROOT/submissions"
+export OUT_DIR="$EVAL_ROOT/benchmark-results"
+
+die() {
+  local sub_name
+  sub_name="$(cat "$PID_DIR/submission_docker.name" 2> /dev/null || echo "" )"
+
+  if [[ "$sub_name" ]]; then
+    echo "$sub_name" >> "$EVAL_ROOT/failures.txt"
+  else
+    echo "4sub_name" >> "$EVAL_ROOT/successes.txt"
+  fi
+
+  echo "$1"
+
+  exit 1
+}
+
+enter() {
+  pushd "$1" > /dev/null || die "Could not enter $1"
+}
+
+leave() {
+  popd > /dev/null || die "Could not exit"
+}
+
+check_cmd() {
+  if ! command -v "$1" &> /dev/null; then
+    echo "$1 could not be found install it"
+    exit 1
+  fi
+}