-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a50e394
commit f016725
Showing
6 changed files
with
354 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Eval Scripts | ||
|
||
## TL;DR | ||
|
||
These are the _very basic_ scripts are for running submissions of the competition. Scripts herein are very dumb, intentionally so. PRs welcome for better scripts! | ||
|
||
## Setup | ||
|
||
The following tools need to be setup on an evaluation machine. | ||
|
||
* `docker` and `nvidia-docker` for running the helm suite and the submissions | ||
* `git` for cloning repos | ||
* `curl` for running the healthchecks to ensure containers are running | ||
|
||
With these installed run the `setup.sh` script with the number of GPUs in the evaluation machine. | ||
|
||
```sh | ||
./setup.sh $NUM_GPUS | ||
``` | ||
|
||
... for example to use 8 gpus in an eval machine | ||
|
||
```sh | ||
./setup.sh 8 | ||
``` | ||
|
||
### What this does | ||
This script will ensure a few tools are present and functional, as well as setting up a degree of isolation on the machine. | ||
This takes the form of docker networks and port exposures for each individual GPU. | ||
|
||
We also checkout the helm evaluation version used in the competition, into the `./private-helm` folder. | ||
|
||
## Running a submission | ||
Make sure the submissions are in `./submissions` folder relative to this script. | ||
|
||
To run a single submission do: | ||
|
||
```sh | ||
./eval-repo.sh \ | ||
'$gpu_device' \ | ||
'$isolation' \ | ||
'$hardware_track' \ | ||
'$helm_config' \ | ||
'$submission' | ||
``` | ||
|
||
What this does: | ||
|
||
* `'$gpu_device'` Specifies the GPU string to pass to docker for the GPU(s) to run the submission on | ||
* `'$isolation'` An isolation factor used to divide submissions between multiple GPUs on a single server. We recommend to keep this number the same as the `$gpu_device` string. | ||
* `'$helm_config'` The config used for helm, must be in a path visibile to the helm container. Private-helm contains configs within the container for the 111 competition. | ||
* `'$hardware_track'` The hardware track to build for in the submissions folder. Essentially a top level folder in `./submissions` that differentiates between different hardware tracks/ | ||
* `'$submission'` The submission folder to build | ||
|
||
The `$submission` is the folder that contains the submission to evaluate. A bare `Dockerfile` is expected at this location to build the submissions container. | ||
|
||
### Layout of the submissions folder | ||
Submissions are laid out in `./submissions` folder in the following fashion: | ||
|
||
``` | ||
./submissions | ||
├── 4090 | ||
│ └── $user | ||
│ └── $repo | ||
│ ├── README.md | ||
│ ├── submission_1 | ||
│ │ ├── Dockerfile | ||
│ │ └── ... | ||
│ └── submission_2 | ||
│ ├── Dockerfile | ||
│ └── ... | ||
└── A100 | ||
└── $user | ||
└── $repo | ||
├── README.md | ||
├── submission_1 | ||
│ ├── Dockerfile | ||
│ └── ... | ||
└── submission_2 | ||
├── Dockerfile | ||
└── ... | ||
``` | ||
|
||
### Putting this together for a simple run | ||
|
||
Using the above layout to run a submission for A100 `$user` `$repo/submission_2` on the second GPU you would do the following: | ||
|
||
```sh | ||
./eval-repo.sh \ | ||
'device=1' \ | ||
'1' \ | ||
'A100' \ | ||
'/helm/config/some_helm_config.conf' \ | ||
'$user/$repo/submission_2' | ||
``` | ||
|
||
## Changing locations | ||
Should you want to change the locations of the `./private-helm`, `./submissions` or `./results` folders you will need to edit `utils.sh` | ||
|
||
* For `./submissions` change `$BASE_SUB_DIR` in [`utils.sh`](utils.sh) | ||
* For `./results` change `$OUT_DIR` in [`utils.sh`](utils.sh) | ||
* For `./private-helm` change [`build-eval-container.sh`](build-eval-container.sh) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
#!/usr/bin/env bash | ||
|
||
cd private-helm || exit | ||
docker build -t llm-eval . |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,152 @@ | ||
#!/usr/bin/env bash | ||
|
||
source ./utils.sh | ||
|
||
cleanup() { | ||
local llm_eval_container | ||
llm_eval_container="$(cat "$PID_DIR/llm_docker.name")" | ||
docker stop "$llm_eval_container" | ||
|
||
local sub_name | ||
sub_name="$(cat "$PID_DIR/submission_docker.name")" | ||
docker stop "$sub_name" | ||
|
||
kill -15 "$(cat "$PID_DIR/submission_log.pid")" | ||
sleep 10 | ||
|
||
docker rmi "$sub_name" | ||
} | ||
|
||
trap cleanup EXIT | ||
trap cleanup SIGINT | ||
|
||
submission_name() { | ||
echo "${1//\//_}" | tr '-' '_' | tr '[:upper:]' '[:lower:]' | ||
} | ||
|
||
gaurentee_dirs() { | ||
mkdir -p "$PID_DIR" | ||
mkdir -p "$BASE_SUB_DIR" | ||
mkdir -p "$OUT_DIR" | ||
} | ||
|
||
build_submission() { | ||
local hardware_track="$1" | ||
local submission="$2" | ||
|
||
local sub_name | ||
sub_name=$(submission_name "$submission") | ||
|
||
enter "$BASE_SUB_DIR/$hardware_track/$submission" | ||
|
||
docker build -t "$sub_name" . 2>&1 \ | ||
| tee "$OUT_DIR/$sub_name-build.log" \ | ||
|| die "Could not build $sub_name" | ||
|
||
leave | ||
} | ||
|
||
healthcheck() { | ||
local port="$1" | ||
|
||
local max_retries=10 # Maximum number of retries | ||
local retry_delay=120 # Delay between retries in seconds | ||
|
||
local url="http://localhost:$port/process" | ||
local data='{"prompt": "The capital of France is "}' | ||
local accept='Content-Type: application/json' | ||
|
||
for ((i = 0; i < max_retries; i++)); do | ||
sleep $retry_delay | ||
if curl -q -X POST -H "$accept" -d "$data" "$url" ; then | ||
return 0 | ||
else | ||
echo "Retrying healthcheck (Attempt $i)..." | ||
fi | ||
done | ||
|
||
die "Could not healthcheck after $max_retries retries" | ||
} | ||
|
||
run_submission() { | ||
local submission="$1" | ||
local isolation="$2" | ||
local gpus="$3" | ||
|
||
local network | ||
local sub_name | ||
|
||
network="llm_eval_$isolation" | ||
sub_name=$(submission_name "$submission") | ||
|
||
local port="808$isolation" | ||
|
||
docker run \ | ||
-d \ | ||
--rm \ | ||
--name "$sub_name" \ | ||
--network "$network" \ | ||
--runtime=nvidia \ | ||
--gpus "$gpus" \ | ||
-p "$port:80" \ | ||
"$sub_name" || die "Could not run $sub_name" | ||
|
||
echo "$sub_name" > "$PID_DIR/submission_docker.name" | ||
|
||
( docker logs -f "$sub_name" > "$OUT_DIR/$sub_name-run.log" 2>&1 ) > /dev/null & | ||
echo "$!" > "$PID_DIR/submission_log.pid" | ||
|
||
healthcheck "$port" | ||
} | ||
|
||
get_ip() { | ||
docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' "$1" | ||
} | ||
|
||
run_helm() { | ||
local submission="$1" | ||
local isolation="$2" | ||
local config="$3" | ||
|
||
local sub_name | ||
local ip | ||
local llm_eval_name="llm_eval_${isolation}" | ||
|
||
sub_name=$(submission_name "$submission") | ||
ip="$(get_ip "$sub_name")" | ||
|
||
echo "$llm_eval_name" > "$PID_DIR/llm_docker.name" | ||
|
||
docker run \ | ||
--rm \ | ||
--name "$llm_eval_name" \ | ||
--env HELM_HTTP_MODEL_BASE_URL="http://$ip" \ | ||
--network "$llm_eval_name" \ | ||
-v "$OUT_DIR:/results" \ | ||
llm-eval \ | ||
/helm/do-run.sh "$config" "$sub_name" || die "Could not run helm" | ||
} | ||
|
||
main() { | ||
local gpus="$1" | ||
local isolation="$2" | ||
local hardware_track="$3" | ||
local config="$4" | ||
local submission="$5" | ||
|
||
# Isolate for specific runs on multi-gpus | ||
export PID_DIR="$EVAL_ROOT/state/$isolation" | ||
|
||
if [[ $# != 5 ]]; then | ||
echo "Usage $0: gpu-spec isolation hardware-track config repo submission" | ||
exit 1 | ||
fi | ||
|
||
gaurentee_dirs | ||
|
||
build_submission "$hardware_track" "$submission" | ||
run_submission "$submission" "$isolation" "$gpus" | ||
run_helm "$submission" "$isolation" "$config" | ||
} | ||
|
||
main "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
|
||
source ./utils.sh | ||
|
||
git_clone() { | ||
git clone "$1" | ||
} | ||
|
||
setup_docker_network() { | ||
local num_gpus="$1" | ||
for isolation in $(seq 1 "$num_gpus"); do | ||
if ! docker network inspect "llm_eval_$isolation" > /dev/null 2>&1 ; then | ||
docker network create "llm_eval_$isolation" || die "Unable to create llm-eval docker network" | ||
fi | ||
done | ||
} | ||
|
||
main() { | ||
if [[ $# -ne 1 ]]; then | ||
die "Usage $0: [number-gpus]" | ||
fi | ||
|
||
check_cmd curl | ||
check_cmd docker | ||
check_cmd git | ||
|
||
git_clone "[email protected]:llm-efficiency-challenge/private-helm.git" | ||
|
||
enter private-helm | ||
git checkout neurips_eval | ||
leave | ||
|
||
./build-eval-container.sh || die "Cannot build the eval container" | ||
setup_docker_network "$2" | ||
|
||
echo "Make sure submissions are present in submissons" | ||
} | ||
|
||
main "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#!/usr/bin/env bash | ||
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) | ||
|
||
export EVAL_ROOT="$SCRIPT_DIR" | ||
export BASE_SUB_DIR="$EVAL_ROOT/submissions" | ||
export OUT_DIR="$EVAL_ROOT/benchmark-results" | ||
|
||
die() { | ||
local sub_name | ||
sub_name="$(cat "$PID_DIR/submission_docker.name" 2> /dev/null || echo "" )" | ||
|
||
if [[ "$sub_name" ]]; then | ||
echo "$sub_name" >> "$EVAL_ROOT/failures.txt" | ||
else | ||
echo "4sub_name" >> "$EVAL_ROOT/successes.txt" | ||
fi | ||
|
||
echo "$1" | ||
|
||
exit 1 | ||
} | ||
|
||
enter() { | ||
pushd "$1" > /dev/null || die "Could not enter $1" | ||
} | ||
|
||
leave() { | ||
popd > /dev/null || die "Could not exit" | ||
} | ||
|
||
check_cmd() { | ||
if ! command -v "$1" &> /dev/null; then | ||
echo "$1 could not be found install it" | ||
exit 1 | ||
fi | ||
} |